Information age is the name given to the 21st

century period of human history when our economy shifted from traditional industries to information

technology. But the origins of the impact of information

on human culture dates back to thousands of years ago when stories of myths and legends

were passed down in an oral fashion. The truth is that information has always been

a cornerstone for the spread of civilisation. Thousands of years, billions of people, creating,

sharing, storing and even destroying information, but nobody figured out a way to measure it

in such a way as to make sense, to give us some insight. It may sound unbelievable, it did to me, but

the concept of information had not been properly formalised until the 1940s. Before that, information had always been a

vague term, encompassing various different ideas, much like Energy or Motion. Just as Newton came along and formalised motion

with his laws, and Einstein precisely equated mass with energy, it was time for information

to come into scientific limelight. Information is meaningless without its ability

to be communicated, without its ability to be translated. As I’m speaking, the words are transmitting

through the air into your mind. As I’m reading something, the meaning is communicated

from the page into my mind. All uses of information require communication. There are two different metaphors for understanding

this communication. The first one is an objective look at information,

it says that information is like flowing water through a pipe. What you send is what you receive. It conveys the idea that information is a

thing in itself. It’s a thing in itself. A second, conflicting metaphor of information

is that of choice. The sender makes a choice of the words and

the symbols that are necessary to convey the information. The receiver requires an ability to understand

those choices. Without the receiver understanding the english

language, no matter what I say in english, that information is meaningless. In this sense, information is always a recreation. At every step, information is recreated in

a different medium. This metaphor is what is necessary to understand

in order to understand what Shannon’s work means, what a bit means. 1948, a mathematical theory of communication,

Claude E. Shannon. This is what we are going to study for the

next few videos. Let’s play a game of twenty questions with

bhaiji. Bhaiji knows over a thousand famous people,

and he is thinking of one of them. Champa rani here has to figure out which of

those thousand people he is thinking of by only asking twenty question of yes or no nature,

right? What is the worst question to ask. What will champa rani ask which is so useless? Is it Mahatma Gandhi? Our friend Daryayi Ghoda, he’s sitting here. Mr. Ghoda, he is watching this show on TV,

Mr. Ghoda doesn’t have to listen to the answer, he would say, “obviously it is No!”. This no contains almost no information. The number of possibilities was a thousand,

bhaiji knows a thousand people. It reduced to 999. A mere 0.1 percent. What is the best question to ask, the best

question? Champa rani asks, “Is it a male?” Now this question is truly uncertain. Daryayi ghoda our friend really leans in to

listen to the answer because he has no idea. If bhai ji now says, “No!”, this answer, this

no, has so much more information because the number of possibilities reduced from 1000

to 500. A whopping 50 percent reduction in uncertainty.The

first no and the second no will have such a different information content. How can we capture this property. We see that the higher the probability of

an answer, the less information is contained in that answer. So information content is inversely proportional

to the probability of its outcome. Shannon realised that log 1 over p is a good

candidate for measuring information. If we use logs with base 2, the units of information

content would be binary digits or bits. Let’s apply this to our example and see what

happens. The first no had a probability of 999 to 1000,

which comes out to have the information content equal to 0.00144 bits. Second case, “No” had a probability of 1 by

2. Information content comes out to be 1 bit. But wait, if by a stroke of pure luck had

bhai ji chosen Gandhi, has answer “Yes!” would have been the most improbable, so it should

have the most amount of information content. And that is true intuitively. That “Yes!” contains all the information to win the game

by Champa Rani. Champa Rani with that “Yes!” wins the game. So that “Yes!” really does contain all the information required. And when we calculate the information content

for that “Yes!”, it comes out to be 9.97 bits. So this measurement of information really

does have intuitive sense. So what is the average information content

that can be expected from the first question. We can calculate average information content

as a weighted sum of all possible answers’ information content. Shannon gave this measurement the symbol “H”,

and called it the “Information Entropy”. Let’s look at this paper. This thing! This thing is a graph. This tells you a lot about what Shannon’s

intuition might have been like. When you plot this, let’s make sense of this

equation. Let’s take an example of a bent coin, a coin

that is only going to turn heads p times out of hundred. The probability of a heads turning up is p,

and the probability of tails turning up is then 1-p, right? You change p, and you calculate H at every

point, and you plot that graph. When p is 0, there is no chance that heads

will turn up, it’s always going to be tails. What is the use if a flip? It’s not going to show us anything, we already

know it’s going to be tails. It’s no information to us, it’s no surprise

to us. H should be 0 and H is 0 over here. This equation over here says H is going to

be 0. Same with when p is 1, right here. Which means that it’s always going to turn

up heads. Same case. And then we start plotting, and we realise

that it has a maximum at 1 bit and when does that happen? It happens when p is equal to 1-p and both

are 0.5. A fair coin gives the maximum amount of information

per flip. This goes really well with intuition. This works really well with intuition. This is the basis for data compression, this

is the basis for sending information through a noisy channel and a lot of interesting things

that will come out of it in the next video. This is not the end of this, we’ll finish

this paper in a few videos’ time and try to understand however much we can understand

and let’s see how it goes, let’s see where it takes us.

Loved it. Waiting for more content on Information…

keep up the good work!