Alrighty, well, welcome everybody. There’s
a bunch of chairs over here if you’re having trouble finding one. I’m Cliff Lynch, the
director of CNI, and I’m very, very pleased to welcome you all to Seattle for the Spring
2015 Seattle CNI member meeting. I’m glad to see so many people here. I’m also delighted
that I have not heard tell of nasty travel surprises that sometimes we encounter, you
know, April blizzards or things of that nature that wouldn’t be out of character with the
winter we’ve had, at least in some parts of the country.
So, I’m delighted you’re all here. I’m going to be very brief. I have just a couple
of administrative kinds of announcements and you will find all of these echoed on the announcement
board by where you picked up your badges and programs. The 5th Avenue Room sort of isn’t
happening and everything that was in the 5th Avenue Room has been moved to Cascade A and
B. Cascade A and B is on the third floor, one level down. You can get there by escalator
or by elevator. So whenever on the program it says 5th Avenue Room, read Cascade A and
B. We have one cancellation. That is the session
on the IMS Learning Analytics Work that was scheduled or 5:15 this afternoon. So that
session will not be taking place. And I would just remind you that if we run into any other
relocations or rescheduling we will post it by the registration desk. There should also
be a list there of the session that we’re going to be trying to capture video for, just
for your reference. Having done those brief announcements, now
I get to the good stuff. I am just incredibly pleased to be able to welcome Brewster Kahle
back to CNI. Some of you know Brewster as a treasured colleague and a leader within
our community going back, well we were just talking about that, let’s just say a long
time and neither of us are looking quite as young as we were, although he’s doing much
better than I am. But Brewster has done incredible things. Some
of you may know his early work on distributing information retrieval systems, now just sort
of woven into a lot of our fundamental assumptions about things. All of you, I’m sure, know
at least some of the work of the internet archive of how he stepped in and took up the
challenge of preserving this strange and wonderful new thing called the World Wide Web, as it
started to emerge. Some of you may be familiar with other initiatives that he has undertaken
since and I think he’ll fill us in on some of those.
Brewster is someone with a great commitment to information access worldwide, a great commitment
to stewardship and preservation in order to support that access, one who has fought very
hard and tried to find hard ethical balances in situations that nobody’s been in before
about the collection and archiving and subsequent access of information. He genuinely is one
of my personal heros and so it gives me great pleasure to welcome him back today.
I’ll just say one other thing by just maybe as context to some of what I think he’s
going to talk about. We’ve relied on Brewster and the internet archive for leadership and
really even more than leadership, just being out in front, you know, all by their own in
the wilderness sometimes for a really long time. And as things get ever more complex,
the scale gets ever larger. I think that he’s been thinking a lot about how to get a broader
collaboration going about how the kind of scale of the problems we’re facing today,
the challenges, is outgrowing, you know, any single institution. And so I think that, you
know, we’ve benefitted from the pathfinding of the Internet Archive for a long time. One
of the things that we’re increasingly recognizing, I would say in these last decade, are the
themes of collaboration and sustainability and I think he’s going to give us a lot
to think about in connecting pathways in on one side and collaboration and sustainability
as necessary survival strategies on the other. Welcome back Brewster!
(applause) Thank you Cliff, and it’s wonderful to be
back. I was given the opportunity to speak here now I guess now twelve years ago and
really what we ended up doing was kicking off the open content alliance. This was in
the era when the Google Library Project was really just getting going and the question
is how were we going to react. And, um, we, as the library community came together, formed
organizations and got together and got moving. There are now two and one-half million books
now publicly both downloadable and also integrated into (_7:23_____ Trust), but also available
for open access as well. And it is an outcome of CNI. And so I’d like to say thank you
to CNI for continuing what it is you do in all these ways. (Applause)
I thought something that might be helpful today is to try to deal with the issues around
Modern Materials, things that may have rights to them probably have rights issues to them,
at least has consternation and questions around it. And the idea for this talk, and there’s
going to be some time for Q and A, the idea of what can we do often together in ways that
maybe we would feel that we didn’t want to do on our own, or what is not technologically
or within the legal framework as it’s currently evolving, is sort of OK to do. So I’m going
to suggest and go over a couple of the different paths that we’ve gone down with partners
towards being modern materials into our digital collections and try to offer broad public
access to it and sort of what’s happened out of that.
So, the internet archive…the Internet Archive is an independent non-profit. It’s not part
of the University and not part of the government. It’s about 12 million dollars a year an
organization. It’s a 501C3, a non-profit. I would say it started out being an Archive
OF the Internet. So the idea is to collect the World Wide Web and whatever else was going
to become this internet thing and try to make that permanently available and permanently
accessible to people and users, as well as bots.
And we’ve wanted to become an Archive ON the Internet. So not just constraining ourselves
to things that were available to be hovered in, but to go into books, video, software
and the like. And most recently, and so this is just a portrait of the Internet Archive,
and I’ll go back to the adventures of trying to being these things up.
We’ve found what we’re really trying to do is to Build Libraries Together. I like
to play games where there are lots of winners. The idea of having a monopoly library or monopoly
of any particular contents, I find that actually deeply scary. We’d like to be, I’d like
to be a participant in this, but we don’t want to be the only one. In fact we want hundreds
of them. We want lots and lots of libraries to flourish and the technology which might
have been very difficult and went for centralization over periods of time, I think we can go and
reverse some of these trends and build our libraries together with some of the original
ideas of distributed collection criteria and services in an era which seems more like,
isn’t a library just a cloud service we subscribe to someplace? Emphatically, NO!
Let’s go and make libraries plural happen and flourish together.
So with that I’m going to go through some of the programs that we’ve been involved
in, all in partnership with others, and then say a little bit of how we survived these.
We see ourselves within the tradition of traditional libraries.
I love what people carve above doors. So this is what’s carved above the door of the Boston
Public Library….Free to All. And it was carved by the robber barons, who weren’t
very nice folks, as I understand it. And they were pretty controlling, but there was something
about libraries and the information in them that was important to spread far and wide.
We started by crawling the World Wide Web and the idea was to try to make a snapshot
of every webpage from every website every two months starting in 1996. So a snapshot,
and another snapshot, then another snapshot, another snapshot, and starting to get bit.
(laughter) But when we started this, our lawyer friend said, “Bad things are going to rain
on you.” Alright? Just going and telling people that you’re making whole scale copies
of other people’s websites, not even doing anything with it, was going to be a very bad
thing to do. So we reached out and the Library of Congress wasn’t quite ready for it yet,
but the Smithsonian was. And so we worked with the Smithsonian and they gave us a letter
with the little sunburst on it and said we’re proud to work with this organization that
I think was three of us, or four of us at that point, to go and collect the Presidential
Election Websites of 1996, because it was important to the mission of the Smithsonian.
And David Allison of the Smithsonian knew exactly what he was doing. He was giving us
a cover. He was spreading his wing to a little organization that needed cover, because all
the recommendations we got was that bad things would happen.
Well, it turned out that bad things didn’t happen. And we started collecting along and
everything was fine. And then we basically said, well what’s the next step we can do?
Let’s make it even more available. And in the year 2001 we made the Wayback Machine.
So we then took everybody else’s stuff and we offered it for free on the internet for
anybody to have access to it. Okay, so it’s a little theme.
So what do you think our lawyer friend said? What would happen to us? Bad things would
happen to us. So what did we do. Larry Lessig and…we launched it in the Bancroft Library
at UC Berkley with all the wood around it and it was, you know, Tom Leonard was there
and it was the whole situation. They knew exactly what they were doing. They were putting
their umbrella on top of this little organization that had just gone, and without permission,
collected, I don’t know, a billion web pages by then and were going to make them available
on the internet. And it turned out that bad things didn’t happen. People used it up
a storm. A lot of people wanted things taken out of the Wayback Machine, but it was all
oneszy twozy things. It wasn’t you have to take everything out. And frankly it was
really helpful that nobody dragged us in front of a judge, because I really didn’t want
to have to argue with a judge that, yes, we should go and try to get permission for every
web page on every website every two months. It just wouldn’t have been practical. So
actually just having the world sort of continue along was incredibly important. But there
were very few people that wanted to come along with us on this journey. The Library of Congress
went and commissioned us to go and collect some things for them, but they were very nervous
about the Wayback Machine coming out publically. They thought it could bring down the whole
edifice of doing anything by being too public with it. But when the Washington Post wrote
it up as a good thing, then suddenly everybody was happy. And the reason I’m saying this
is not because ‘Gosh weren’t they wrong? Ha, ha, ha.’ No, no, no, no, it’s completely
natural. This is exactly what really does happen, as people are like…they’re kind
of worried about it…it’s like, well, what’s going to happen? And we don’t know and you
just have to kind of put it out there. In retrospect it all seems very silly and easy.
At the time it’s scary as hell. And so this is a situation that did work…
And we now have Pets.com and thank God that we collected CNI.org in 1997. Tada! And you
can actually click through and see what it is that was going to be on the agenda then.
And we’ve gotten good things that came of all of this. There’s even the White House
would have press releases that would say that President Bush, by standing on an aircraft
carrier, that Combat Operations in Iraq Have Ended. And then a couple of days later, change
the press release change the press release to say Major Combat Operations Have…was
there any note saying that we changed the press? Nope. This was found by users of the
Internet Archive. And sort of surfaced as, hey isn’t this great? But at this point
there’s still only one. Then we basically built a tool to work with
others. And we built a Web Archiving Tool that people could download and use our tricks.
And a lot of people have done that, but it’s kind of a pain in the neck. So a lot of people
subscribed to the Internet Archive to go and build collections on the Internet Archive
that they could then go and take away with them. This was at a time when people were…most
people in like the National Libraries were still not doing any public access to their
web collections. They may have started collecting, but they didn’t provide any public access.
But this is State Libraries and State Archives and Universities have come together. There
are not 350 partners that are going and doing Web Archiving collectively. And I think it’s
a little bit like the sort of picture in Cairo? You know, when everybody was in the Arab Spring
standing together arm in arm and taking a step forward. And it’s that kind of a moment
that we can collectively go and build services that aren’t necessarily centralized. You
say, well, this is a centralized service, but it does allow these materials to be back
downloaded and you can run you own tools to be able to go and use these for your own research
uses. So the idea of collective tools for distributed collections was really what it
was what we wanted to do. And by operating together, cause we’ve got a very peer oriented
group, that our field is peer oriented. We look side to side as to what is okay. Often
we sometimes check with our bosses, but maybe not as often as we should. We look to what
else are people doing. And the idea of doing this worked quite well. So one joint collection
is the Japan Disasters of 2011 and people came together very quickly as the tsunami
happened to go and collect these things together. And so there’s all these wonderful materials.
Archive web pages no long available. The Collection To Date is now quite large,
by a sort of subject focus. And the National Diet Library in Japan didn’t feel that it
could go and do crawling without asking permission, so they were thrilled that we were doing it
and they made this big ribbon cutting ceremony for us to come back and present into their
collection this document of what happened in their country because they didn’t feel
that they could do it themselves. So I’d say this is an example of our working together
and making things go right. Lending Television…I’m going to make a
few more examples of this. Television. In 1976 the AFTRA Act said that the Library of
Congress was supposed to go and archive television. As best I can tell, in 1996 the major outcome
of this was a report saying, Oops, we didn’t do it. But it’s really important and we
should do it. So we hit the record button in the year 2000. Russian, Chinese, Japanese,
Iraqi, Al-Jazeera, BCC, CNN, ABC, FOX 24 hours a day. DVD quality. We didn’t ask permission.
And bad things didn’t happen. But we just held on to it. And the question is, what could
we do with it? And one the hero librarian stores I think is the Vanderbilt Television
News Archive that started archiving television just before the Democratic National Convention
in 1968 and was lending out television to researchers. And they were sued in the early
70s and I paraphrase…your honor, New York has to shudder because capitalism is over
because a librarian made a copy. And the judge did not concur. And what came out of it was
an exemption in the 1976 Copyright Act. But Vanderbilt kept on it and they basically got
this little exemption to allow lending of television news. So the Internet Archive,
building on the shoulders of Vanderbilt, built a service a couple of years ago to go and
lend television news so that you can go and search and try to find materials. You can
search and find clips to embed in your blogs or if you wanted the w hole program you could
borrow it on DVD. We stamp the DVD and send it to you. And the question is sort of, “isn’t
that clunky when you could just do a download?” It’s like, yeah, we want to do a download,
but we’re trying to be respectful. We’re trying to basically find a way that doesn’t
interfere with the business models of the broadcasters. But not only has preservation
happened, but having access happen. And it turns out that this has worked actually
very, very well. We haven’t had anybody complain. In fact,
the news organizations are really thrilled because now they have access to their old
collections and they’ve been asking for data dumps of the old closed captions so they
can do data mining over their own news programs. And so it’s actually turned out to be fine…that
it has worked. How do we do this? We gathered a bunch of scholars before we did this.
We held a conference that was supported by the Knight Foundation. The Knight Foundation
was very nervous about this whole program. But we got this together, got all these researchers
say it’s absolutely critical. We need to know al of this. That was helpful to sort
of set a context. We came out with it and the New York Times thought it was a terrific
thing and others seemed to think so too. And then the Knight Foundation came back and funded
a further program going forward and now we’re having a bunch of different support that’s
been going on to try to help understand the influence of money in Political Ads in American
Elections. And I’ve got a real downer quotation from
this study that was just finished of watching all of the programs in the Philadelphia region.
We recorded all of them. Not just news, but also all of the entertainment programs, and
found all of the political ads. And then counted them up. And the University of Delaware and
went and also found all of the political news on all of the news programs in the Philadelphia
area. So, political ads versus television news in terms of the number of minutes.
For every one minute of television news about the political we had 45 minutes of political
ads. It’s just devastating what’s going on. And it’s the first time this type of
thing is known. What I find interesting about this is that it’s not just retrospective
looking, it’s trying to make our libraries useful in the current issues of the day. So
it’s not like, gosh, glad you did this, somebody in the future is going to be really…you
know…blah, blah, blah, blah. No, right now we need this for our public discussions. So
we are now working with libraries all over to go and use these materials in new an different
ways. And that’s the sort of collective action I would suggest that is really important.
It’s also starting to become affordable, which is, you kind of think of television
that’s so darned gigantic. How could it be affordable? We have about 3 million hours,
I think it’s around 9 petabytes of data, but if you take one channel a year, it’s
only 10 Terabytes. That’s a little bit bigger than the current hard drives. The current
hard drive you can buy right now is 8 Terabytes. It’s kind of amazing that almost is a channel-year.
So in a channel year you can do this too. And your compute scientist guys are going
to love it and your digital humanities guys are going to do it and we can do it in a distributed
way. Or we can work together and share the results. So this is another example of, I
think, kind of going forward. We can go and have our libraries become not just subscribers
to other people’s services, but to actually have services ourselves and to go and do our
own collecting and doing it. The next step that we went into was Music.
And the whole question of what do you do about music?
This was back in sort of about 2000, 2001, 2002, 2003. There were just lawsuits all over
the place. This was sort of the time of Napster and so, of course, we asked our lawyer friends
and they said, “Oh, what’s going to happen to you is going to be bad.” So, we tried
to find some way around this by finding people that actually wanted t things to happen. And
we went with, there’s a tape trading community. Actually there is an intern that was working
at the Internet Archive. He came forward and said that, “You know, Brewster, the Grateful
Dead started this tradition of tape trading and oh yeah, I had my cassettes, right?, back
in the day.” He said, “It’s still going on. It’s going on..on the internet.” I
said, “Really?” And he said, “Yeah, there’s lots of bands that copied it.”
And so he said, “You know, Brewster, you keep talking about going and being storage
for cultural materials, why don’t we talk to them?” I said, “Great, why don’t
you write them a note?” These are the tape traders that are online.
And so he wrote a note. He said, “We’d like to offer unlimited storage, unlimited
band width, forever for free.” (Laughter) “What do you think?” And they wrote back
and said, “We don’t believe you! It’s too big. But if you could do it, it would
be our dream.” So that’s a good step when somebody says, it would be our dream. So we
said, “Try us.” And so we thought about it a little bit from the band’s perspective
and just going and posting these up on the internet is different from tape trading. Tape
trading used to be kind of a pain in the neck. It was kind of as bad as going to your town
library. You had to to…you know…it was a pain. And that meant that it didn’t happen
quite as much. So we asked some level of permission from the bands. We didn’t go and ask our
lawyers what was the proper form for them to sign, so we just asked for any mail from
anybody associated with the tape friendly bands to just send us an email saying, “Yeah,
it’s okay.” It can be the drummer, the web master, it could be anybody. But then
we had somebody in the community say it’s okay. And we posted their email response and
it worked. So, we got two, three, four bands a day saying yes, and fans uploaded about
40 or 50 concerts a day. Now, I wouldn’t have thought starting the Internet Archive
that music concerts would have been a thing that we would have done. I twas just responding
to a need…that there were cultural materials out there and it started to work. We now have
130,000 concerts from 6,000 bands and we have everything the Grateful Dead has ever done.
So, it’s working. So it’s a system that worked.
We went on to do other internet oriented distribution music. Before MP3, the format, was standardized,
there were other formats and one was distributed by the Internet Underground Music Archive.
And we we’d archived some of their web pages, but not all of them and it turned out that
one of the founders of the Electronic Frontier Foundation, John Gilmore, had recorded all
of it on a hard drive. And so he came up to us a couple of years ago and said, “I have
all of IUMA.” We said, “Great!” And so we thought, well, can we find everybody.
Should we ask permission? And we said, “No, let’s just post it and see what happens,
and if there is anybody who’s unhappy we’ll take it back down again.” So we took these
albums from the early internet distribution, posted them, and people were thrilled. The
company had long been munched in acquisitions and the participants, the people that had
posted music, were thrilled to see their music back. Often they didn’t even have it on
themselves. So here is another case of working with a community to find, sort of, is there
some level of okay and then move forward. Also in Music we started working with NetLabels,
which are internet era labels, and we have lots of them and by providing free hosting
that’s been really worthwhile. We’re now starting to work with CDs, LPs,
and 78s. And here is where it’s a little more problematic for us and we’re trying
to figure out what to do. We’re starting to spread our wings by working
with a few labels, like I just put up, but also with these archives, like the Archive
of Contemporary Music in New York. They’ve got 2 to 3 million audio recordings. So that
is just the mother load. Can we go and try to get those CDs and LPs and put them up.
And we’re starting to get better at the digitization process. We asked our lawyers
what would happen and they said bad things would happen. And it turns out that the Preservation
Function hasn’t elicited that type of response. We’re still trying to figure it out and
get it going and trying to get it to be a more distributed project so that we can get
lots of libraries taking their CDs and LPs, digitizing them, making themselves available
to themselves and in central repositories like ours. Our current idea is to go and have
libraries go digital so people can take their existing collections, put the CDs or LPs in
some sort of machine, prove that they have it, if it’s already been digitized, blink,
they can have access to it as if they had digitized it. If it hasn’t been digitized,
then digitized it and add it into the pool. Kind of like we built OCLC back in the day.
Can we go and build our music libraries together, but then allow people to have full download
access to their whole collections in digital form. What you can do with it beyond that
we’re not exactly sure and it will all evolve. But at least on campus access seems to be
happening a lot. I get a lot of librarians saying, “Yeah, we let everybody on campus
have it, but don’t tell a lot of people.” And if I gathered all of t hose people up,
I bet it would be a third of the organizations that are represented in this room. So it’s
starting to happen. We’ve got a model for at least an on campus access. And the Internet
Archive is doing that same kind of thing. Can we go and as a collective group of us
bring our libraries digital? We brought our catalogs digital, can we bring our libraries
digital? I think that is a great opportunity. So we don’t’ just go and have the internet
or, oh, it’s all happening, cause they’ll do it. No, we should do it and then we could
have access and it’s small. The amount of storage that it takes, even at high resolution
by computer standards, is quite doable. We got 40,000, a donation of 40,000 78 RPM records,
which we want to than the Bavaria Public Library for giving to us, but really what I’d like
to argue is, please don’t get rid of your collections. Store them well and store them
maybe off site. Store them maybe in shipping containers. It’s cheap enough to do, as
long as you don’t have to have rapid access to it. We’ve figured out how to do inexpensive
offsite storage so you can hold on to your own collections. If you really don’t want
to hold on to your collections, then please donate them to the Internet Archive and we’ll
hold onto them. So we just got 50,000 LPs and we’re getting better at doing mass digitization
of CDs and LPs. A user of this collection already, Daniel
Ellis from Columbia University, had this to say. Basically they need access to comprehensive
collections to do their new research. The type of research that Aaron Schwartz does,
going and downloading a lot of materials and going and making, doing computer analysis,
is the norm. It’s what we should be actively supporting and we are now supporting Daniel
Ellis and a number of other researchers in the music world because we’ve got these
collections put together. And we’ve gone and made these available
inhouse in our cool little reading and listening nook. And nobody’s had problems with it.
So we’re not just taking the CDs and putting them out on the net, but we’re having them
accessible on campus and looking for others to play with. I think this idea of standing
together, taking a step forward and doing it in a distributed way is a more robust,
more resilient mechanism of building these materials up.
So, Audio Collections is now starting to grow and the Internet Archive is smaller than most
of your, by budget or by staff, than most of your Universities or organizations, so
you can do these sorts of things as well. Moving Images. Most people think of Hollywood
films. Most of this stuff is all tied up in enough rights and it’s actually fairly available
anyway, so we’ve mostly stayed away from it. What we’ve gotten a lot better with
is old materials, 16 mm, 8 mm, home movies, those sorts of things. People love them. It
was a real surprise to me that actually people use this as stills from the “are you ready
for marriage?” You know, those social behavior films from when we were in Jr. High School
when there was a substitute teacher and they’d reel in the 16 mm projector. Those, we have
those. (laughter) And they’ve been downloaded often hundreds of thousands of times.
Because I think it’s how a visual generation is trying to understand the 20th Century.
Not just from a Hollywood perspective, but from home movies and these sort of ephemeral
films. We’re just now digitizing 7,000 films that were donated by a major research university
with the help of Mallon and CLARE Foundations, a fellow to wort of watch over and get these
things available. We learned to not ask enough lawyers to try to figure out what would happen
if we put these things up on the offering, because if we just sort of channel they’d
just say that bad things would happen. But we just haven’t been finding this to be
the case, that we basically deal back and forth with the organizations if things come
up and we just take them down and it seems to be working out pretty well.
We also offer unlimited storage for people to upload things.
And we’ve just started to get inexpensive equipment.
And there is…we’re starting to do VHS tapes. So there was a commissioned report
on what to do with VHS tapes and are they, you know, can we use section 108a something
or other, and I think the report basically said, NO. So we just started to anyway.
And so what we did is we just took the VHS tapes that were from the San Francisco Friends
of Public Library Book Sale, them remnants and we had volunteers look to see if it was
available on DVD for sale new. In other words, is it currently being flogged? Not is it available
on eBay. Is it available new. And if it isn’t, then we digitize it and put it up. And it’s
worked fine. So we’re trying to stay away from commerce, right, trying to stay away
form people’s …. Valid business practices. But we’re trying
to make the older stuff available and everybody is thrilled.
It turns out to be inexpensive to digitize these materials.
And we now have an awful lot of them up. Texts. So as I said, …
We started the Open Content Alliance, but mostly people were going for the out of copyright.
But we wanted to do all of it, so the Library of Congress maybe 28 million books, a book
is about a megabyte, that’s 28 Terabytes. That’s 4 current hard drives. You can spend
less than a month’s rent and have the storage capacity for all of the words in the Library
of Congress, something new has happened. You guys can go and have these collections within
your collections, even if it’s just for certain types of uses, and use other subscription
services for certain access things. But the public domain should be more publically available.
But we didn’t stop with just the public domain, like these wonderful books of Euclid.
And putting them on digital tablets and the like.
What we did is start to just digitize, well, everything. Anything that we could afford
to digitize and then try to make as much access as we could.
So we got good at digitizing inexpensively, Got it down to around 10 cents a page, so
$30.00 for a 300 page book. And thanks to the library community, working together we’ve
worked with about 500 libraries now to build sort of an open version of the Google Books
Project. And as I said before we’ve got about 2.5 million books done through this
sort of system. And there are just some fabulous works out
of rare book collections. Working in China and something that I’m
really happy about, ‘cause I’ve been looking for a community…
A nation or a language group that would allow us to digitize everything in their language.
(laughs) It just, can we just, would somebody go open. And the Balinese said yes. So we
basically opened, we digitized all of the published works in Bali, which is kind of
neat. And what they do is they publish on palm leaves. And so it’s palm leaf books
and we basically digitized by photographing with them all of their books. And we said,
okay, well, how do you read them? And they said, “Read them? Well, the priests read
them, but sometimes they are read as a shadow puppet play.”
So this is actually a reading of one of their books,
Or, their performances. But I thought it was completely great that
the first group of people to say, “I want to go online, ‘cause it’s going to be
beneficial to our language and our culture” were the Balinese.
So we now have Scanning Centers all over the world. They are close to people. It’s not
that expensive to do. We’re coming out with a smaller scale portable scanner, so it’s
easier to do. And we’re starting to work through the rights issues.
We have maybe 3 million free books, but we have a million books that are available to
the blind and dyslexic, because we can and 300,000 that we are lending.
So what does lending mean? So this is in copyright, non-rights cleared books, that have been given
to us by about 500 different libraries with the express purpose of digitizing and lending
in copyright, non-rights cleared books. This has been going on now for 4 years and there
have been basically no problems. What it means to lend is that we try to buy
books from publishers so that we can lend them, but they’re not that psyched about
doing that in general, so mostly we’ve been digitizing and lending. This is a book that
we bought, but it’s been checked out, so you can add it to your list.
Here is a more obscure book from the Boston Public Library, but it’s from 1990 something
or other, and it is available to be checked out.
You have a few different formats. And then you can borrow it and you’re the
only reader of this for two weeks. So for two weeks you’re the only, everybody else
is locked out of this book. And it’s been going on just fine. So this
is an approach towards working together to lend books and we’re lending to people all
over the world. And what we’d really like to do is take
your collections and bring them digital and then you could lend things inside your own
organizations, say. Wouldn’t that be neat? So if this hard drive, this 8 Terabyte hard
drive, can store 8 million books, if you can keep a card catalog going that keeps track
of your lent out books, then you can operate the technology to lend out digital books and
make sure that it’s only used within whatever constraints possible. It would make us feel
much more..safe..if you will by working together and having lots of work together. Other libraries
are using our platform, but it’s still kind of just us. But is there a way of spreading
it so that there are lots of libraries that have their collections digital and you can
go and make it available just to your computer scientists and maybe you use the local consortia
to go and put together like in California, CALIFA, is basically starting to operate shared
lending facilities. But it means that you’re not beholding to somebody else. And I think
that’s going to be absolutely critical to have a robust library system going forward.
Software. So in the software area, we’ve been starting to do large scale collections.
We believe we have 500,000 titles, but we don’t have all of the cataloging done. We
have about 90,000 titles of PC software era software on our discs, working with different
communities doing the lion’s share of the work. But we’re not doing the work, they
are, and we’re working with them in interesting and new ways.
And now, based on some work that we did, we began with lots of volunteer communities and
we got the first level of emulation to work so that you can go to a web browser, go and
see a piece of software, say in my case it was VisiCalc, ‘cause I’d never seen VisiCalc
run. And you can go to a web browser, click Run, and what it does is really surreal. It
reads its virtual floppy that has the software on it for VisiCalc and it’s running. It’s
completely surreal to me. But it’s possible now to run emulators in your browser, and
it allows us to, in some sense, lend software, so you’re not really getting it, you’re
getting to use it and it’s not been a problem. We’ve been putting things up at a phenomenal
clip and we’ve been contacted by a lot of the producers that are still, if they’re
still flogging the materials they want it taken down, and what do we do? We just take
it down and it works. So we’ve gone and put up tens of thousands of pieces of software
now up and running and it’s used by a generation that are just absolutely thrilled.
New up for us: Personal Digital Archives. We’re starting to not just go and do that
digitization materials or the hard drive era collections of things, which is where my family
keeps it’s photos. I’d say a lot of younger families are keeping their photos on Flickr
and all of these other not terribly stable environments. And so we, as libraries, really
need to come together to start to figure out the tools so we can go and build the archives,
say the professors or kids or institutions, it requires being forthright and aggressive
about doing our jobs. And I’d say I haven’t had people come to us and say, “No, we don’t
want libraries anymore.” And I haven’t had people come to us and say, “You’re
not a library, you’re really a blank.” Or say, “No, you’re really a library,
we want libraries, let’s find ways for us to all move forward together.”
So, in conclusion, Universal Access to All Knowledge, say it’s in our grasp and it
could be one of the great works of humankind. I think it could be remembered like the Library
of Alexandria or the Gutenberg Printing Press, as one of the terrific things that happened.
But it’s going to involve all of us. And it really isn’t something of just going,
“Oh, Google’s got that covered.” Or, “It’s HOTI Trust, no problem, I’ll just
subscribe to it.” Not good enough. It’s really going to take us moving forward and
in fact, some of the projects, like the Google Library Project, got stopped as a monopoly.
That was the reason why they stopped it in the courts. It’s not the library system
we as a society want. It’s having one organization, the Book Treasure Registry, controlling the
distribution of out of print materials. That doesn’t make sense. It’s our turn. And
fortunately the technology has become inexpensive enough to do things within our organizations,
or to do in clumps of them, or working together in different ways. And I guess, lastly, carved
above the door of the Public Library, In Pittsburg, the Carnegie Library, his legacy
is Free To The People. Thank you very much. (Applause)
I hope that was provocative enough to get some questions going. I think we’ve got
maybe ten minutes or so, so please….what should we be doing now? Yes, David.
I’m David Rosenthal. So what let you deal with books and software was in effect streaming.
And it seems to me that this is something which could be more generally deployed to
deal with the copyright issues. People have much less of a problem if you can’t actually
get a copy and squirrel it away, you just get the experience.
Brewster: I think streaming is a good intermediate step that has worked in a number of circumstances.
So, the Grateful Dead, we had all of their materials up and the band sort of got nervous.
And they said, well, you should take it down. I said, we’ll take it down, but..(laughs)…there
are going to be some really unhappy Deadheads out there. And so we took it down and there
was this big kerfuffle that played out in the press and the like. And what ended up
happening is the audience recordings they allowed to be downloaded. The soundboards
that were never, those are direct patches of their sound, that was never really part
of the tape trading deal and those are available streaming. And that seems to work quite well.
Even the book viewing is streaming and it does sort of play an intermediate role. It’s
not that great, though, for research. So streaming isn’t going to make your computer scientists
happy because they are going to have to scrape it and that’s a pain in the neck. So I think
having copies yourself of the materials so that you can go and make bulk access to ones
that you feel comfortable allowing bulk access to is tremendous. And going and depending
on another organization, whether it’s the Internet Archive to build all those research
services, which we’re in the process of, and so it’s HOTI Trust, but isn’t that
what our universities should be doing, is building some of those services and making
some of these different interplays available. So I’d say bulk access is important, but
under more controlled environments for the modern materials that probably have rights
issues, like television. I’d say that’s exactly what we are doing with television.
David Rosenthal: I’d say the other issue with bulk access is that bulk access these
days is getting awfully big and so moving the…moving the analysis to the data, other
than moving the data to the analysis may end up being the only way you can cope with it
simply from the band width and storage capacity issues.
Brewster: Yes, but not let’s go and lock that into policy. I just don’t want to have
it regulated that there’s only going to be one or two copies of these materials. We
want to make it so that the big boys could take copies away, and it’s somewhat getting
easier. Moving Terabytes around is easy. Petabytes, yeah, it’s still clunky.
David Rosenthal: Yes, and I noticed that you only have a couple of copies of everything
that you have, right? Brewster: Yes, we have two copies, then we
have a partial copy in Amsterdam and a partial copy in Alexandria, Egypt.
David Rosenthal: Yeah, that’s a problem that needs to get fixed.
Brewster: Yes, I think there is somebody that said lots of copies keep stuff safe. I, I,
I, (laughter) I think we should agree with them. And we need help.
Lady: I have kind of a related question to that. You mentioned a couple of examples where
what people really want to do now is mine the large collections. But the truth is most
researchers can’t download the quantities of data and audio and content that you’re
collecting, so are you contemplating anything like the HOTI Trust Research Corp, some kind
of modality where people could actually do their research in your collections, ‘cause
I see that as the low hanging fruit to get people excited about the stuff you’re collecting.
Brewster: Yes, we’ve been doing different approaches toward trying to get researcher
access. And what I’ve found is, so far, is big data usually means lots of data points,
not lots of Terabytes. That people still, even if you have access to it, they don’t
know what to do with it. What I think we need is a good middle layer of open source software
that makes it so that we can have digital humanities people not have to have a programmer
glued to their side to get some of their research done. And we’re exploring and trying to
figure out building an institute with fellows to go and build that middleware. Because we’ve
sometimes gone and just opened it up, but we took one crawl of the World Wide Web, it
was about 80 Terabytes and we said, come and get it. But tell us, you know, we’ll give
anybody a download key, you can have access to it, and I don’t know, 25 did, got an
access key, and we never heard from them again. I don’t know if they did anything at all.
‘Cause it’s kind of clunky and hard. So I’d say we’re still in early stage in
this. But let’s do it together. Let’s have some fun with it. So, let’s go with
some of these different collections. Let’s move t hem around. Let’s build some open
source software so that it’s not just the land of the esoteric coders to be able to
do like the studies like we were doing on the TV news by doing all of those programs,
finding all of the ads. That’s actually a pretty cool Master’s level thesis. Let’s
get there. We have the data. Let’s see what we can do with it. And we can do some of it
on our servers, but some of it on yours. How do we make it all go is, I think, the opportunity
that we should be pursuing now. Lady: Hear, hear. Thank you.
Elizabeth Jones, University of Washington. So one of your refrains in this talk, which
I thought was interesting was, lawyers said bad things would happen, no bad things happened.
But a lot of the things you talk about, a lot of people have gotten into trouble for
doing sort of similar things. And so, I’m wondering if you could speak to why the Internet
Archive hasn’t gotten into trouble for doing these things, as opposed to those other folks
who have? Brewster: Yeah, Paul Curran from Michigan,
said, “Why haven’t you gotten sued?” (Laughter) And I…maybe we’re just lucky.
I really…I can’t tell you. Though we’re very concerned about every aspect of what
we do and we try to be respectful and we try to be open about what we’re doing. We make
no money out of that…out of this, so that kind of helps. What I’ve learned by going
and collecting everybody else’s stuff over the last 20 years and then making it publicly
available again, is people don’t want to feel like they’re being taken advantage
of. If they feel like they’re being taken advantage of, they’ll throw things at you
and laws are just one of them. They’ll throw threats. They’ll do all sorts of things
to cause you to stop. If they feel like they’re being taken advantage of. So how do you do
that? How do you deal? One is, do a really good job of what you’re doing. Do it beautifully.
Be respectful. I mean, they spent a lot of time making that thing. Do it right. So, and
the software…we’ve gotten a lot of the early software developers of these games who
are thrilled, because it’s actually done quite well. So that’s one. Don’t make
any money. And engage in conversation. Winston Tabb, when he was at the Library of Congress,
when I was being trained on sort of how to do all of this stuff, he said, “just remember,
it’s their stuff, be respectful.” And that’s another characteristic. We have a
bend, not break, policy. So, where we probably could have stood our legal ground and faced
somebody down and said, “Yeah, screw you!” We didn’t. And sometimes it made some people
mad in our environment because we took things down that we maybe didn’t really theoretically
have to but it sort of just, everybody’s just trying to figure out this digital transition.
So we try to talk more to the business people than to the lawyers in these organizations.
And try to find ways to…how can we make their business work better. I think we’ve
put a lot…too much faith in lawyers figuring things out. And it’s just not going to happen.
Laws tend to trail. New laws, when they’re done hypothetically, tend to be very bad,
especially for us. So, I don’t think laws are going to be done before we get there.
We’re going to have to get there, do things, and if there’s enough trouble, then we’ll
have to being in the legislators to go and figure it out. So I think we have to go and
figure out what a library looks like and I like the old style library where you have
collections. You build collections. You serve your populace really well. You pay publishers.
You make things happen in a distributed way. I think we got brought into this digital collection
idea back when…oh, an anecdote. I visited OCLC way back when I think they were still
called the Ohio Computer Library Consortium then, and they were run on Honeywell computers.
And I asked, “Wow, how big is the database?” They said, “Really large.” “How big
is it?” “Really big.” And I said, “in bytes.” And they said, “Well, it’s in
mini-mark records, it’s 17 Gigabytes.” I said, “Wow, it doesn’t sound that big.”
And they said, “Well, you know.” But it’s a lot of maintenance to go and get that database
really done well, right? I’m not saying that it’s cheap to be OCLC. It never has
been, never will be. But it’s that the…you don’t need an acre of mainframes to run
these data collections anymore. You can at least have copies much less expensively now.
Going and maintaining it and keeping it up is as expensive as it always has been because
it’s people. But the technology has made it so that there are certain things that we
need to question our old assumptions about, and that we have opportunities beyond what
it is we are currently doing. And I might say we are arguing ourselves into not very
good positions because of our old habits in terms of understanding how to get things done.
Let’s build a distributed library system. Steven Davis, Columbia University. Clearly
you’ve built one of the greatest knowledge resources in history. I don’t think anyone
can argue with that. And uh, it’s just amazing. (Applause) Over the last ten or so years many
of us have been working in the areas of digital preservation, long term digital preservation.
A number of models and approaches and theories have been offered up, including the trac certification
and ISO follow on, and then now DEEPEN is part of the landscape, or we hope it is. How
does an Internet Archive fit into that new framework for long term digital preservation?
Brewster: Um, don’t really know. The Internet Archive has been I’d say underfunded. I’d
say all of us say we’re underfunded. But I think we’re like severely underfunded
for what it is we’re trying to do. And so we’ve kind of just bludgeoned forward to
try to figure out how inexpensively we could do things. And we try to do a good job and
be transparent about it. And we’re trying to come around a corner now. That that third
phase of trying to build libraries together is really…is driving our organization to
be more engaged with user communities and institutional communities. So with DEEPEN,
for instance, you know, I know how to spell it, but that’s probably about…that’s
about as far…and it’s not because it’s not a good thing to do, but it’s just been….we
haven’t really gotten there yet. So, please don’t give up on us. We’re better staffed
now. Wendy Hanamura on the front row is actually….we now have a director of partnerships that can
actually answer emails. Because of you write an email to me it’s iffy…um…um…so
I think there are rules for these different organizations in every country should at least
have these. I know that things are going on with our northern neighbor in terms of going
and having collections there too. I highly encourage this. So, encouraging, but we’re
kind of lame at the moment. Steven Davis: So, some DEEPEN people will
be getting in touch with you very soon. Brewster: Thank you.
Sue Brewster: I couldn’t let that last comment go. The comment of don’t give up on us,
meaning you. In my, so, having been a librarian for a very long time, my definition of a research
library is one which continues to build collections and over the last two decades many of the
libraries that used to be research libraries really have fallen down, according to that
metric. More and more libraries lease access to content. More and more libraries, librarians
care about the material that they pay large amounts of money for, which is not at risk
at all and not those things that are freely available. And I would turn…I just want
to say what’s amazing about your presentation right now is that you frankly haven’t given
up on this community and I hope they don’t let you down.
Brewster: Thank you. (Applause) I don’t know how it’s going to go. Some people say
that the horse is out of the barn, that we’re going to end up with uTube, Elsevier, JSTOR,
HOTI Trust, The end. That being a university librarian is going to be contract negotiations
and personnel issues. We’re going to become customer service departments. And I don’t’
think that’s the right way to go. And I’m not exactly sure what the shape is going to
be. I don’t….a lot of our business models are wrong. If you’re running a library,
you’re only really there to serve a local community. And how to we then go and change
that so that I’m going to do the best ornithology server in the world? And I just don’t need
to just serve Cornell with it, I can serve everybody. But how do we then go and pay for
that. How do we get our business models adapting to distributed service provisioning that isn’t
really locally oriented? And I don’t know the answer to this. Don Waters, whom I hold
in high esteem, I see you in the audience. He is won this darned argument for the last
20 years. I’ve been trying to make it so that we can provide universal access to all
knowledge and be supported at it. And he’s pointed out that it’s very difficult to
get libraries to pay for things that they’ll get for free. Why would they? Yet, often distributed
endlessly, as opposed to subscription services work better in the internet generation. And
I don’t’ know how to square that circle…that we’ve really got a system that has business
models built around physical collections that are uniquely local. But we have now an opportunity
of making services that every one of them is global, yet we have no mechanism to go
and find a support mechanism for it. So, I think of the father of digital libraries,
Mike Lesk, put it…the thing that he was worried about. He’s worried about two things….he’s
worried about the 20th Century, because he thinks the 19th Century and the 21st Century
are in pretty good shape, but the 20th Century might get forgotten because of the copyright
issues. And the other is institutional responsibilities. What are we supposed to do? When you go back
to you offices next week, the question is…are you going to do something differently because
of this? It’s hard to imagine because your constraints are the same as they were the
week before. The opportunities are different though. And how we respond to those opportunities
is going to define how the library system looks in ten, fifteen years when we’re even
a little bit grayer. Thank you very much. (Applause)