web search history

Hello, this is Geoffrey Fox again. And we’re starting a new lesson
on web searches, with a rather historical perspective coming from
these neat slides I found online. And the nicest thing actually
about these slides are these. These slides here, which are really beautiful
pictures of old libraries, and put the origin of web search back
in these libraries from the past. The Sumerian archives,
with 25,000 clay tablets, which were mainly records
of commercial transactions. The Great Library of Alexandria, which was meant to hold
all the world’s books. It had 750,000 scrolls
when it was at its best. I think it ran into trouble, but sometimes good things
run into trouble. Heck, usually good
things run into trouble. It’s just the nature of being right,
you always get jumped up and down on. Then we had,
here are some beautiful libraries, Which often are associated with
monks, cuz monks didn’t have so much to do. So they did a lot of the copying,
hand copying, before the printing press
was invented in around 1450. Which, of course,
really spread books around and also made libraries
far more important. Cuz they were accessible to
a great deal of other people. And Vatican obviously has a very
rich and important library associated with the history
of the Roman Catholic Church. Now we come to modern libraries. These are German slides, so they
mention the German National Library, which had 25 million items. The Library of Congress in
the US has 150 million items. I suspect if I went and
found the, other countries will be somewhere between
German and the Library of Congress. According to
the Guinness Book of Records, Library of Congress is
the largest library there is. And they have a Library of
Congress classification, which is coming back
to these keywords. And in some sense,
these classifications are important, because you do need to organize
the contents of the web. And libraries have been organizing
things for 1,000 years. Actually a little longer if
you go back to Sumerian times. But since the printing press, there’s been 600, well, 550 years. And libraries have just been
diligently working during that time, organizing things. And that organization has principles
that actually help web search. So here we have the key
concept of metadata, which we’ve already discussed
under the semantic web. I pointed out author and editor and things like that are in
the Dublin Core. A keyword tend to be part
of the core metadata. And they just specify the areas
which are covered by the book. And this was the historical way. These little three-by-five cards,
or whatever they are, were the traditional way that you
captured a book for searching. So now it’s very different. Because you can do full text
searches in a practical fashion, you tend not to use
three-by-five cards. And that’s where we come
to the full text search. The catalogue cards are a proxy or
a representation of the book in a small space, and
they’re what you used to have. [COUGH]
But you had to have a very good
classification scheme. And actually, good classification
schemes were designed. So it has to be rich enough to
actually have some content, and simple enough to be implementable. It still has this huge problem,
which is really the fatal problem, which is why I say the semantic
web doesn’t work so well. That was, essentially, associating
metadata with every webpage. That will never work,
there are too many webpages. There were not so many books, and so people did tend to associate
catalogue items with books. And in full text searches, just
taking every word of the book and making it a keyword. And it doesn’t require
any manual work, cuz that can all be automated
using our diligent computers. Especially when you have 10 million
of them sitting around waiting to work for us. Concordances were the, sort of,
precursor of full text searches. I never used to use concordances,
but they were produced for special books, like the Bible. And I gather this fellow
Hugo took 500 monks and made a concordance of
the Bible in 1250. And here’s an example of how
the important words here, the ones beginning with
I in the Bible, and basically maps the words to
where they appear in the Bible. That’s the end of that basic
discussion of web search and the relationship between
previous library works. The next lesson, we’ll discuss other fundamental
principles behind web search.

Leave a Reply

Your email address will not be published. Required fields are marked *