Information Theory: Origins


Information age is the name given to the 21st
century period of human history when our economy shifted from traditional industries to information
technology. But the origins of the impact of information
on human culture dates back to thousands of years ago when stories of myths and legends
were passed down in an oral fashion. The truth is that information has always been
a cornerstone for the spread of civilisation. Thousands of years, billions of people, creating,
sharing, storing and even destroying information, but nobody figured out a way to measure it
in such a way as to make sense, to give us some insight. It may sound unbelievable, it did to me, but
the concept of information had not been properly formalised until the 1940s. Before that, information had always been a
vague term, encompassing various different ideas, much like Energy or Motion. Just as Newton came along and formalised motion
with his laws, and Einstein precisely equated mass with energy, it was time for information
to come into scientific limelight. Information is meaningless without its ability
to be communicated, without its ability to be translated. As I’m speaking, the words are transmitting
through the air into your mind. As I’m reading something, the meaning is communicated
from the page into my mind. All uses of information require communication. There are two different metaphors for understanding
this communication. The first one is an objective look at information,
it says that information is like flowing water through a pipe. What you send is what you receive. It conveys the idea that information is a
thing in itself. It’s a thing in itself. A second, conflicting metaphor of information
is that of choice. The sender makes a choice of the words and
the symbols that are necessary to convey the information. The receiver requires an ability to understand
those choices. Without the receiver understanding the english
language, no matter what I say in english, that information is meaningless. In this sense, information is always a recreation. At every step, information is recreated in
a different medium. This metaphor is what is necessary to understand
in order to understand what Shannon’s work means, what a bit means. 1948, a mathematical theory of communication,
Claude E. Shannon. This is what we are going to study for the
next few videos. Let’s play a game of twenty questions with
bhaiji. Bhaiji knows over a thousand famous people,
and he is thinking of one of them. Champa rani here has to figure out which of
those thousand people he is thinking of by only asking twenty question of yes or no nature,
right? What is the worst question to ask. What will champa rani ask which is so useless? Is it Mahatma Gandhi? Our friend Daryayi Ghoda, he’s sitting here. Mr. Ghoda, he is watching this show on TV,
Mr. Ghoda doesn’t have to listen to the answer, he would say, “obviously it is No!”. This no contains almost no information. The number of possibilities was a thousand,
bhaiji knows a thousand people. It reduced to 999. A mere 0.1 percent. What is the best question to ask, the best
question? Champa rani asks, “Is it a male?” Now this question is truly uncertain. Daryayi ghoda our friend really leans in to
listen to the answer because he has no idea. If bhai ji now says, “No!”, this answer, this
no, has so much more information because the number of possibilities reduced from 1000
to 500. A whopping 50 percent reduction in uncertainty.The
first no and the second no will have such a different information content. How can we capture this property. We see that the higher the probability of
an answer, the less information is contained in that answer. So information content is inversely proportional
to the probability of its outcome. Shannon realised that log 1 over p is a good
candidate for measuring information. If we use logs with base 2, the units of information
content would be binary digits or bits. Let’s apply this to our example and see what
happens. The first no had a probability of 999 to 1000,
which comes out to have the information content equal to 0.00144 bits. Second case, “No” had a probability of 1 by
2. Information content comes out to be 1 bit. But wait, if by a stroke of pure luck had
bhai ji chosen Gandhi, has answer “Yes!” would have been the most improbable, so it should
have the most amount of information content. And that is true intuitively. That “Yes!” contains all the information to win the game
by Champa Rani. Champa Rani with that “Yes!” wins the game. So that “Yes!” really does contain all the information required. And when we calculate the information content
for that “Yes!”, it comes out to be 9.97 bits. So this measurement of information really
does have intuitive sense. So what is the average information content
that can be expected from the first question. We can calculate average information content
as a weighted sum of all possible answers’ information content. Shannon gave this measurement the symbol “H”,
and called it the “Information Entropy”. Let’s look at this paper. This thing! This thing is a graph. This tells you a lot about what Shannon’s
intuition might have been like. When you plot this, let’s make sense of this
equation. Let’s take an example of a bent coin, a coin
that is only going to turn heads p times out of hundred. The probability of a heads turning up is p,
and the probability of tails turning up is then 1-p, right? You change p, and you calculate H at every
point, and you plot that graph. When p is 0, there is no chance that heads
will turn up, it’s always going to be tails. What is the use if a flip? It’s not going to show us anything, we already
know it’s going to be tails. It’s no information to us, it’s no surprise
to us. H should be 0 and H is 0 over here. This equation over here says H is going to
be 0. Same with when p is 1, right here. Which means that it’s always going to turn
up heads. Same case. And then we start plotting, and we realise
that it has a maximum at 1 bit and when does that happen? It happens when p is equal to 1-p and both
are 0.5. A fair coin gives the maximum amount of information
per flip. This goes really well with intuition. This works really well with intuition. This is the basis for data compression, this
is the basis for sending information through a noisy channel and a lot of interesting things
that will come out of it in the next video. This is not the end of this, we’ll finish
this paper in a few videos’ time and try to understand however much we can understand
and let’s see how it goes, let’s see where it takes us.

Comments 1

Leave a Reply

Your email address will not be published. Required fields are marked *