04 Brewster Kahle: Big Data was the reason the CM was built


Was your interest in Big Data and
knowledge, did that arise then as part of your thinking about what the Connection Machine could be good for? Or was that something that… –That’s what we built the Connection Machine to be! It was good at Big Data. Some people have attributed me with coming up with the term Big Data. And if it was, which I think it could be, it came from the Laurie Anderson title “Big Science, Hallelujah.” And it’s–
–[singing] Big Science! Hallelujah! So Big Data, always with the sort of little
smirk, right – a Laurie Anderson smirk – of “Big Science, Hallelujah!” “Big Data,
Hallelujah!” and the idea that we could put these together – information together –
to make something smart. It was a very different radical rethinking. There was,
before the Connection Machine, before Thinking Machines, there was a lot
of ideas that you just had to be super smart to be able to understand language.
Another approach was, just have a lot of language, and be a little bit smart.
So a lot of data, little smart. And that got us so much further towards
making progress towards the goals of AI. And that I think is what we ended up
with the search engines, the web, the Internet, is lots of ideas and then doing
little bits and pieces on top of it. I remember Doug Lenat, who worked at Thinking Machines, who built this
wonderful system called Cyc to try to map ideas by hand putting together a
computer readable version of the encyclopedia. Way ahead of his time.
But he said that, “Wow, I went and used your system WAIS”– this is before the web–“and it’s amazing! It actually works really well just by having a lot of data and being a little bit smart.” And I think that some of the tension… It’s not right or wrong, but Thinking Machines, because it had the chutzpah to go and build a machine that cost millions of dollars that would be
used by a single person, that that would be something that you could do. You would be able to see the future, and the future turned out to be the future we’re
currently living in, in the early 21st century. –And that’s interesting, because I
remember the programming paradigms at the time –the big boys, the Crays, and
IBMs, and whoever was building supercomputers at that point were
saying, this is ridiculous; you’ve got 1-bit processors; they can’t do anything;
you need intelligent processors. But the problem is when you have more
than one or two of them, then it’s really difficult for them to coordinate with
each other. Therefore parallelism is a concept that’s doomed… There’s also this
Amdahl’s Law which– this proved to be erroneous– that the problem would grow exponentially as you’re adding more and more processors to the mix. So that really was, I think, the revolutionary aspect of Thinking Machines saying that we’re going to start out, so to
speak, small, with these 1-bit processors, and each one has its own data.
So a data parallel machine: lots of information, maybe you can only work on
certain types of problems, but those are important problems like modeling weather, or doing computer graphics, or doing the sort of searches that you’re talking
about, and then we’ll see what happens. Now I’ve heard that…
–Okay. Just a moment. Okay. You can’t be behind the camera. You’re part of the picture. Come on, sit.
–No no! –Yeah yeah! I get you. We get to make our own history. Are you going to be in the frame?
–I have no idea! I think so, I think I’m just barely in the frame, but… but whatever. So a couple
things that I’ve heard from people is that, well, those days back in the ’80s,
people were saying, it’s impossible to write parallel programs. And then I
remember talking to Rolf Fiebrich who said, well, it turns out if you bring up
undergraduates learning how to serial program, and then try and teach them how to parallel program, then it’s difficult. But if you teach them parallel
programming from the beginning, then they’re like fish in water; they don’t
realize that this was supposed to be hard. Is that true? Is parallel programming being taught now? Or is it still mostly serial? I mean, my perception is from, you know, Harvard CS50 Introduction to Computing; they’re teaching like C and things like that. It’s all object-oriented, but it’s all serial. –Oh yeah, we’re still stuck. We built this machine that… it took really really interesting and smart people to think through how to do it. Karl Sims did amazing computer graphics. Creon Levitt. There was the guys that did colliding galaxies. Stephen Wolfram always thought this way; he just thought of the world as a pixelated world that evolved. And so, it was innately a big cellular automata to him. But most people think of small pieces together, and it’s in some sense the tragedy of Thinking Machines,
that we started out with 64,000 processors, and we got to these more and more clumped larger scale processors. So if you look at how the Internet
Archive or Alexa internet, or Amazon or Google are built, they’re actually quite
substantial. I mean, you can point to the computers. You know, 1, 2, 3, 4, you can count them, right? It’s not like there’s millions and millions of these things.
–Is that true also of, for instance, Google’s search farms?
–Yeah! Yeah, you know, you can count them. Usually you design your algorithms by
thinking of a federation of computers, rather than it all being one smooth smart memory. So in some sense, we haven’t succeeded yet at some
of the really brilliant ideas. There’s been a legacy of why we got there, but
there’s still much more we can do. We ended up separating out the computer
design, and the manufacturing of silicon went to Asia, and so the design was still
going on here but it was… and by having that kind of long distance it just made
it too slow. If we could integrate… One of the great things at MIT and even
Thinking Machines– I was an undergraduate at MIT. I could design chips! What the heck! And basically I designed a chip by writing a computer program that generated this outline of what the art, what the picture of the chip should be, and I sent
it in email to Moses. It was a system that then a couple months later, it
came back a chip! I thought this was just the way the world worked. I mean it’s
like hitting print, and you get a chip. It’s like, what a great thing! What an
opportunity. It allowed us to, as an undergraduate, write an undergraduate thesis, on how to design and implement– and implement!– a micro microprocessor. A completely different computer architecture can
be built in a few months by an undergraduate. You can’t do that anymore, and it’s not because the technology doesn’t allow it. It’s because we’ve screwed up our corporate structures, and the integration of how government funding
works within our universities. We’ve blown it. So a lot of things were stuck. And the programming one is absolutely one of them. –So do you think there’s an opportunity to then bring that manufacturing capability, especially as… for instance, as things become more robotized, mechanized, AI driven, so that again you could have undergraduates
sitting there and having the chips fabricated… –Yeah, I think it’s coming up from
below. It’s the maker movement. It’s the “I want it
back, damn it!” Enough of these stupid large-scale multi-billion dollar
corporately-run manufacturing units. Let’s build it from scratch! And so we’re
seeing the invention wherever we can give the tools to people. And let’s give
more and more interesting tools. You know, maybe it’ll be DNA sequencing,
and then composition, that will be before the computers come around.

Leave a Reply

Your email address will not be published. Required fields are marked *