How big can an error be when we estimate something?

Often, when you make an estimation based on many assumptions, people say "There might be errors in all your assumptions, and the error on the result, being the sum of all these errors, is going to be huge".

In reality, errors compensate each others.  You might overestimate one variable, but will underestimate the next one. Unless you are biased, the error will grow like a drunken wanderer.

Say we want to estimate the number N of something. Number of candies eaten by children in the world. Or piano tuners in Chicago. Or whatever.

To estimate N, we multiply estimated values e_i of the factors which contribute to N, whose real (unknown) values is a_i. For estimating the candies, we might have the number of people in the world, fraction of children, sugar-producing crops and so on.

N = a_1 \cdot a_2 \cdot ... \approx e_1 \cdot e_2 ...

In the end, will compute our estimate by multiplying all e_i.

Now, let's say that you are really bad in estimating, and you never get the right value. All e_i are wrong by a factor 2 –sometimes your estimate e_i is the double, sometimes is one half of the actual value a_i.

Now you do what any good engineer would have done before the advent of pocket calculator when had to multiply numbers –you sum logarithms:

log(N) = \sum_i log(a_i) \approx \sum_i log(e_i)

(\sum means "sum".) But we said your estimates are

e_i = a_i \cdot random(2, 0.5)


log(e_i) = log(a_i) + random(+1, -1)

Approximating log(2) to 1.

This allows us to separate the errors from the estimates and write

log(N) \approx \sum log(a_i) + \sum random(+1, -1)

log(N) \approx log(N) + log(\sigma_{final})

where \sigma_{final} is the error you'll get at the end of the estimate.

The logarithm of the final error, \sigma_{final} , actually diffuses quite slowly. Like drunken wanderers who can only walk on a line, will make one step in one direction, than two steps in the opposite direction, and so on. After S steps, 70% of those drunken wanderers are on average no more than \sqrt{S} steps away from their starting point.

This means that 70% of the times the log of your estimation error is not bigger than \sqrt{S}*log(\sigma), where \sigma is the average (estimated) error factor, which we initially assumed to be 2. Or

\sigma_{final} = \sigma^{\sqrt{S}}

With S number of assumptions you made, \sigma the average error for each factor, \sigma_{final} the final estimation error.

Posted in Articles, Scienza | Leave a comment

Rounding Errors in Algebraic Processes

Unless you are using a symbolic language like Sage or Mathematica, you should always be aware that floating point numbers are... dangerous. In practice, the computer stores the digits and the position of the point. Imagine a "6-digit decimal computer" –a computer that works with base 10 numbers, and that that can store 6 decimal digits plus the position of the dot and the power of 10 we need to multiply the number for. In such computer, the number

1/3 = 0.333333

is rounded down by about one millionth. But the operation:

a = 1 + 1/30,000 = 1.00003[3333...]
b = a - 1

leads to b = 3\cdot 10^{-5}, which means a rounding error of about 10%. Which makes you feel sorry you did not write:

a = 1 - 1
b = a + 1/30,000

which would give  b = 3.33333\cdot 10^{-5}  Normally, this is not a big deal. But in probability computation, it is. Take the way the disjunction of probability is computed. Given N events whose probability is p_i. The probability that at least one of these events happens is:

[1] P(p_1 \lor p_2 \lor ...\lor p_N) = 1 - (1 - p_1)\cdot(1 - p_2)...\cdot(1 -p_N)

(the symbols A \lor B means A or B is true, or both). It is immediately clear that if all p_i are very small, and will be rounded like the 1/3 above, the error can become soon big enough to spoil your computation.

And this is actually a pitty! Because:

[2] (1 - p_1)\cdot(1 - p_2)...\cdot(1 -p_N) = 1 - \sum_i p_i + \sum_{i\neq j} p_i\cdot p_j - \sum_{i\neq j \neq k} p_i\cdot p_j\cdot p_k  ...

where \sum_i means "sum" for all values of i, in this case 1 to N, \sum_{i\neq j} all values of i and j not equal between themselves, etc.

Equation [2] tells us that if we solve [1] algebraically, subtract 1 and only after substitute the values in the various p_i, the error would be smaller.

Let's make an extreme example. If we had two events with p_1 = p_2 = 10^{-6}:

[3] P(p_1 \lor p_2) = 1 - (1-10^-6)^2 = 2\cdot 10^{-6}-10^{-12} = 1.99999\cdot 10^{-6}

Imagine you have a "3-digit decimal computer" (3 digits plus the position of the point and the power of 10). If you solve equation [3] using the formula in [2], even with such lousy computer the result would be a decent 1.99\cdot 10^{-6}. This because both 2\cdot 10^{-6} and 10^{-12} do not need more than 3 digits. On the contrary, a naive computation, where the products are computed through equation [1] substituting the value of the variables, would give:

[1'] 1 - 0.999\cdot 0.999 = 1.99\cdot 10^{-3}

A terribly wrong result. The short script below, in Python, solve the equation algebraically:

def probabilities_disjunction(probabilities: list):
    n = len(probabilities)
    # tree is a list of strings whose length is len(probabilities) with all possible 0/1 without the 0s.
    # e.g. for three probabilities: 001, 010, 011, 100, 101, 110, 111 
    tree = []
    for i in range(1, 2**n):
    v = []
    for p in probabilities:
        v.append([1., -p])
    prod = 0
    # multiply all probabilities*(-1) according to tree
    for branch in tree:
        inter_prod = 1.
        for i, leaf in enumerate(branch):
            inter_prod *= v[i][int(leaf)]
        prod += inter_prod
    return -prod

It is clearly non optimised, and the number of computations grows exponentially with the number of probabilities.

Posted in Articles, Free Software, Informatics | Leave a comment

A short review of IBM Watson?

Executive summary: Watson is a huge PR product.

Watson can be tried either as application (language supported Java, JavaScript, Ruby:  apart from Java, the other two are not really machine-learning oriented) or as RESTful API. As far as I've seen, there are eight available functions, here:

IBM Watson Developer Cloud

Some of them are just commodity, I can't believe they present them as part of an innovative service!

E.g.language identification: at Anobii, in 2011, we developed a great service for all languages supported by Chrome in less than a day using Google open source libraries. Machine translation too, and here only works for five languages (no Chinese).

Let's go on.  I tried the one I'd use most: Relationship (concepts) Extraction. Italian text gives ridiculous results, but not completely nonsense –which is bad, because it means that it is supported. I then tried to extract relations from a wikipedia page on Urinary Tract Infection and... discovered that the bacterium "Escherichia Coli" is a person! Apart from that, useless.

But OK, let's try the  "Question and answer" feature. You give a corpus of data, and Watson will answer any question after having digested the info in the corpus. This sound familiar (I am working on the next generation of Your.MD Smart Health Assistant) .... and there is already a Health app to test! Here it is: Question and Answer. I write:

"I have fever" which leads to

"One of the earliest signs of juvenile arthritis may be limping in the morning... children ...  have a high fever and a skin rash"

Great, Watson. I must admit juvenile arthritis is not the first condition which comes to mind when one has high fever –and you should weight the result with the  frequency of the condition– but at least there is high fever in it.  I'll give you more info:

"I have high fever but no skin rash". Same results as above!

This looks to me a search engine which makes no attempt to understand the basic syntax of the text it indexed.

Conclusion: IBM should spend more money on science, less on marketing. Or just more on science.

What do AI, ML, and NLP researchers think of IBM Watson?

Posted in Articles, Informatics, Scienza | Tagged , , | Leave a comment

If heat is released from a system, will entropy increase?

Let's ask ourselves this question –will our knowledge about a system increase or decrease when the system cools down?

When it's hot, there is high uncertainty about position and speed of the molecules composing the system. The more temperature goes down, the more precise your knowledge about position and speed will be.

Because von Neumann says that entropy is the missing information between a macroscopic description and a microscopic one, you have the answer –entropy goes down when temperature goes down. At 0 kelvin, the macroscopic description gives you exactly the same information than the microscopic one, therefore the entropy is zero....

If heat is released from a system, will entropy increase?

Posted in Articles | Leave a comment

How is complexity applied to Marketing? The Magnum Ice Cream

You can very successfully use  network theory to analyse complexity in business. It can be fun, and easy to visualize.

I personally understood why businesses can become more complex, yet more successful, analysing the evolution the network of the most successful European brands of the past 25 years –the Magnum Ice Cream. Networks theory today is extremely advanced, and lots of tools are available (see my pictures of the "Magnum Network").

The Complexity of the Magnum Ice Cream

Italians love gelato.  On hot summer afternoons, cities fill up with families strolling  around, each member with a gelato in hand. I have always been part of  this collective passion: when I was a teenager, my friends and I  preferred to meet outside gelaterias and indulge in huge ice cream  cones, rather than get drunk in bars.

In  a country where the gelato is sold soon after preparation, and where  teenagers prefer it to alcoholic beverages, you may wonder how packaged  ice creams can possibly sell at all. But they do, and account for 30 percent of the Italian ice cream market thanks to heavy marketing and widespread distribution. Of the 30 percent, half is in the hands of Anglo-Dutch multinational Unilever, under the brand Algida.

For decades, Algida's strongest seller was the Cornetto,  an imitation of the artisanal ice cream cone, which was launched in  1959, the same year Fellini produced La Dolce Vita. Thirty years later,  the country was miles away from the economic boom described by Fellini,  but the Cornetto hung around and was joined by a considerable number of  competitors.

The  1980s  were a time of consumerist excess, with brands offering cherry,  amaretto, chocolate and biscuit all in the same ice cream. It was in  this environment that Unilever, in 1989, launched Magnum, the simplest ice cream bar ever –vanilla with a chocolate coating.

As an apocryphal quotation of Albert Einstein goes, "things should be as simple as possible, but not simpler".  The Magnum was simple, but not straightforward. Most ice creams had  vanilla filling, but only few of them had good quality vanilla, and not  one that was covered by thick, good, real chocolate.  To produce good  quality coating, Unilever asked Belgian Callebaut to develop a chocolate  that could go down to -40 degrees without breaking, something did not  exist before.

The  Magnum stood out from the crowd because it was simple, yet  sophisticated. According to Unilever, it was already in 1992 "Europe’s  most popular chocolate ice cream bar".

Simplicity,  though, did not last for long with the Magnum. With time, the Magnum  evolved from the original ice cream to an ecosystem of elaborated ice  creams: Almond, Mint, Caramel and Nuts, Yogurt; bigger and smaller  Magnums; even Magnums without sticks. The original Magnum, in this new  fauna of Magnum Ice Creams, was renamed Magnum Classic.

Magnum's Syndrome?

When  this process of differentiation started, and the promise of simplicity  was broken, I got upset. When Unilever came out with Moments, small ice  creams stuffed with caramel and hazelnut, I decided the company had  reached the limit, and prophesied Magnum's fall from greatness to dust.

In  conversations, whenever dealing with something unnecessarily complex, I  would refer to what I called the Magnum Syndrome: "Things start nice  and simple, but with time they accumulate complexity. This is when they  lose their strength, like in the Magnum's case: it is not the delicacy it used to be,there’s too much noise around."

I could find plenty that had fallen victim to the Magnum Syndrome. World's economy, science, politics, societies in general,  Europe in particular. When Cherry Guevara was launched, together with  other terrible Magnum flavours like the John Lemon, the Wood Choc, and  the Jami Hendrix, I considered them the four Horsemen of the Apocalypse.  The Magnum ecosystem will collapse soon, I was thinking while biting  into my Classic. And global capitalism will surely follow.

Taming Complexity

I might have thought that science had become too complex, but I was still convinced that physics is an extremely successful tool with which to tame complexity.

Examples of physics successfully taming complexity abound. Take statistical mechanics. During 19th century, physicists studied the statistical properties of  the motion of molecules in a gas and discovered that despite their  seeming randomness, properties like temperature, pressure, and even the  obscure concept of entropy were all explainable in terms of probability:  the behaviour of billions of molecules could be described by just a few  variables linked to each other.

Key to the success was creating ideal systems, like perfect gas or  But how can one possibly find an ideal system with which to  describe the behaviour of the stock market, human societies or the  marketing strategy of Unilever?

The Network Revolution

With perfect timing, a new branch of physics was officially born together with the fauna of Magnum ice creams: network theory.

Network theory was the illegitimate child of the World Wide Web.  With the Web, it finally became possible to obtain data with which to  study how networks evolve. Physicists and mathematicians threw  themselves into data analysis and modelling, and with new results on  social topics too: networks of people exchanging email messages, web  sites referring to each other, blog feeds, all produced an abundance of  digital data. Results were so original that stern journals like the Physical Review begun to publish articles on social networks –a social topic, for the first time ever.

With network theory, physicists were entering the arena,  and facing the complexity of the "real world" – just like biologist,  economists, sociologists, and anthropologists had been doing for a  while.

The power of networks is that everything can be reduced to a network and studied, even the Magnum ecosystem.  As soon as we can connect two ice creams because they have in common a  particular ingredient, like caramel or dark chocolate, or are part of  the same offer, like the "Seven Deadly Sins", we have a network.

For  instance, the first Magnum Classic leads to the first four Magnum  variations (Double Caramel, Dark, Double Chocolate, and Almond) that  followed it a few years later while the Double Caramel leads to Taste  (in the Five Senses) and Sloth (in the Seven Sins), which are similar  ice creams that were subsequently launched.

In  this way, we can draw a graphical representation of the increase in  complexity of the Magnum system over time. From the simple "star" at the  beginning of the 1990s, with one central ice cream and four peripheral  ones...

(the size of circles is proportional to the influence) the intricate network arrived at post 2000:

If  we traced the evolution of the Web over the same period, we would get  similar figures. We would see complexity emerge from the first 50+  webpages published by Berners-Lee in 19901 to the billion pages in 2000.

The Magnum Strategy: Complexity is Good

In  complex, organised, networks, "the whole is more than the sum of its  parts", writes Herbert Simon in "The Architecture of Complexity" (1956).  This "more", this emergent property of the system –the network–, is  what makes different elements get together in a network and cooperate.

Simon was right. I, on the other hand, had been completely missing the big picture when criticising complexity.

What if we thought of the Magnum Ice Creams as an organism?  Being sold under the same brand, Magnums form a collaborating  community: each Magnum tell us something about the other Magnums  –something good – with all Magnums starting with the most excellent of  reputations, based on the original Classic's. A customer will expect,  and find, good quality ingredients in any Magnum because she knows that  the original Magnum's strength was good quality vanilla, and thick  Belgian chocolate. In this sense the Classic has a link with all other  Magnums collaborating with them in a virtuous circle: the Classic's reputation gets stronger as it recommends other high  quality ice creams, which in turn, being actually decent, recommend the  original Classic. This potential circle can exist for Magnums other than  the classic. Now under the area of influence of the Caramel are the  "Caramel and Nuts" and "Sloth" Magnums, which Unilever introduced after  its success.

With its fast growing reputation, Unilever would continue to introduce new Magnums at a fast pace, making the Magnum empire more complex, but also more powerful.  Thanks to this strategy –complexity with high quality and strong  connections– the Magnum became in 2000 the largest single ice cream  brand in Europe.

This  success was not possible if all twenty-plus Magnums were sold by  different companies, with different brands. We would see a situation  similar to the one before the Magnum arrived: many over-complicated ice  creams, where it is difficult to make a choice. The  stronger the connection between the elements of a system, the bigger  the possible success. No connections between the elements, no success.

Simon shows that Magnum's evolution towards complexity was not just a potential syndrome, but a powerful strategy –the Magnum Strategy:

Start simple. Learn from the environment. Grow complex maximising internal collaboration

A few references:

  • Maljers, F (1992) "Inside Unilever: The Evolving Transnational Company", Harvard Business Review, September 1992
  • Berners-Lee, T., Fischetti, M., & Foreword By-Dertouzos, M. L. (2000). Weaving the Web: The original design and ultimate destiny of the World Wide Web by its inventor. HarperInformation.
  • Dorogovtsev, S, Mendes, J (2003) "Evolution of Networks"
  • Clarke, C. (2012). "The science of ice cream". Royal Society of Chemistry.
Posted in Entropy | Tagged , | Leave a comment

Can a lion jump 36 feet?

Using data from Rory Young's answer: yes, a lion definitely can jump 10+ meters.

The horizontal distance D of a projectile with speed V, launched at an angle \theta is:

 D = V^2 \cdot sin(2\theta) / g

Which gives, for a lion running at 54km/sec, i.e. V=15 m/s, jumping at an angle of 45 degrees, and assuming g \approx 10m/s^2

D = 22.5 m

Considering that running at maximum speed and jumping at 45 degrees is very hard, even for a lion, this is an over-estimation. But jumping at 15 degrees would allow the lion to fly over a distance D:

 D \approx 11 m = 36ft

In addition, if a dog can jump 29 feet (9m), clearly a lion can do better....

View Answer on Quora

Posted in Articles | Leave a comment

How can entropy both decrease predictability and promote even distributions?

Answer by Mario Alemi:

Consider this image:

There is no uniformity, and the level of prediction is high. You immediately see that the person on the left is rich, the one on the right is poor. You know who'll have a good meal tonight. Who has higher life expectancy. And so on.

Consider this now:

High uniformity, low prediction. Prison uniforms, well, they uniformize. From the picture, you can't get much information, apart from the fact that they are in prison.

Therefore: the first picture carries a lot of information about the people. The second doesn't. High entropy for the first, low for the second.

An unequal distribution of income produces the first picture. A uniform distribution produces the second one.

Interesting, this is what happens to societies, organizations, and organisms. They start with low entropy (uniformity, equality) and develop towards higher complexity (i.e. specialization, inequality).

(You can have a look at the way one can define entropy in graph –one of many– and see that for networks it's the same: Mario Alemi's answer to Entropy (information theory): What is an entropy of Graph? Is it related to concept of entropy in Information Theory?)

View Answer on Quora

Posted in Entropy | Tagged , | Leave a comment

What is an entropy of Graph? Is it related to concept of entropy in Information Theory?

First of all, are entropy and information theory related? Yes, and if they are, they are also in graph theory. To let von Neumann explain this relationship: "Entropy is the difference between the information provided by the macroscopic description and the microscopic one".

Von Neumann says the entropy of a system measures how much information the modelisation of a system does not provide. With microscopic description we mean a description of the system which coincides with the best one –the one which allows us to identify each element, and to know everything about it. If the system (in our case the graph) provides us with the same amount of information, its entropy is zero. If not, the entropy is bigger than zero.

Before going to graphs, let us consider a relatively simple system, like a perfect gas (point-like particles interacting elastically).

The best microscopic description of a gas is the one which gives, for each labelled particle, the exact position and speed. If a little demon could provide us a table with position and speed of all particles, we would have zero entropy. (Boltzmann himself uses the imaginary capability of identifying each particle in the ergodic principle, to eventually give a statistical definition of entropy, and soon after Maxwell introduces the knowledgeable demon's paradox).

As found by Maxwell, a perfect gas is potentially a zero-entropy system. Given its boundary conditions (position and speed of each particle), we could uniquely identify each particle and follow their evolution, and extract all desired energy from it.

Something similar can be applied to graphs. Only, it gets a bit more complex. While a gas is perfectly defined by the properties of its constituents –the pointlike particles– a graph is not.

Graphs are defined by the relationships between its constituents. In fact, we use graphs to describe systems which show an emergent properties: the information stored in the whole structure is bigger than the sum of the information stored in each element. This means that we need to look at the whole structure, not at its elements one by one, as we do with gas!

To describe a node in a graph, we could use its number of connections, its centrality, its average distance from all nodes, clusterization of its connections et cetera. But not necessarily these properties will univocally describe that node. How can we understand how well a node of a graph is defined by the structure of the graph (aka the topology)?

To understand that, let us start having a look at the most trivial graph: a triangle.

However precisely one describes a node using the properties of the graph, it is impossible to provide enough information to identify it. All nodes are the same –they have two connected nodes, and those connected nodes are connected between themselves by a single link.

More formally, we say that any of the six permutations of the nodes will produce the same graph. Even more formally, there are six adjacency preserving bijections, and the set of these bijections form the automorphism group (a transformation onto itself) of the graph. Graphically, this means that if someone swaps the labels A and B, and "fixes" the graph, in this case rotating the triangle around the high, we would not notice that A and B are now different nodes than earlier. The graph is symmetric under that transformation.

We can represent that through matrices. The adjacency matrix of the triangle is:

     A  B    C
A  0   1    1
B  1   0    1
C  1    1   0

It's clear that if you swap the first two columns and the first two rows, i.e. swap node A with node B, you get the same matrix:

     B   A   C
B  0   1   1
A  1   0   1
C  1  1   0

In practice, this is a "high entropy" graph, because does not provide much information about its constituents –we better add a label to them, as the graph's topology does not help!

Let us take now a more funny, and complex, graph. One with directed connections, and loops, which says "C talked to itself, and A talked twice to B". Graphically:

The adjacency graph of the graph is the following:

     A   B  C
A  0  2  0
B   0   0  0
C   0   0  1

It is evident that there are no permutations which would leave the matrix unchanged. In this case descriptions like "the node with two incoming connections", or "two outcoming", or "in a loop", would leave no doubt about which node we are talking about. The macroscopic description, through the graph, is the same as the microscopic one –with the label. This is a zero-entropy graph!

How well we can identify nodes is therefore related to entropy: in this sense, complex graphs, where the connections are particularly "telling", are very informative of their constituents, and store more information than simple ones.

We can now compute the entropy according to von Neumann. The uncertainty of a microscopic description (the labelled one) is zero. We therefore only need to compute the uncertainty on the macroscopic one.

In our simple non-directed triangular graph, we have six possible permutations of the nodes, therefore:

S_{full\ triangle} = - log(1/6) = log(6)

We could also see at it in a different way. The uncertainty on the first node is ?. Once we know the identity of the first node, the uncertainty on the second is ½, and then the identity of the third node is determined –therefore S_{full\ triangle} = -log (1/3) - log( 1/2 )- log( 1) .

Nonetheless, we are normally not interested in identifying each node in the graph, but would be happy with less. If you are interested in a certain characteristic of the nodes, and not in identifying them, you can simply refer to the entropy of the probability density function of that characteristic. Say, for instance, you are only interested in knowing the number of connections for each node. As often happen, you don't know the number of connections of each node, but you know that the probability of a node having kconnections is p_k. In this case, the entropy will be:

S_{connections} =\sum_k -p_k\cdot log (p_k)

Posted in Entropy | Tagged , , , , | Leave a comment

Why is the future and the past so different?

The "directionality" of time can be easily explained in terms of entropy, as you point out --and not vice-versa. It's also easy to understand, if you consider the statistical mechanics' definition of entropy.

Slightly formal: Systems evolve from the past, where they find themselves in an improbable state, to the future, where they find themselves in a more probable state than the previous one. If you see water steam entering a manhole in NYC,  you know you are watching a reversed movie.

Less formal: Immagine you have a bookshop, with all books ordered alphabetically. Take a picture in the morning. Then customers arrive,  leave a few books around,  put it back in the wrong order etc. At the end of the day you take one more picture. If there is no one tidying the bookshop, you could identify immediately which picture was taken in the past, and which in the future. Boltzmann (the physicist which formalized the statistical definition of entropy) would say that the entropy in the morning was lower than entropy in the evening.

The entropy of a system in a certain state is a measure of the probability of that system to be in that state. From that, it follows that a system will naturally evolve from a state of low entropy to a state of high entropy, because (again, this is how entropy was defined by Boltzmann) the latter is more probable. Clearly, if you put energy in the system, you can go towards a state of lower entropy, but if you took a bigger picture, you would see other systems going towards states of even higher entropy (your body while tidying up the bookshop, for instance).

In this sense we say that time is a dimension with direction. It is different from space, and this is why, to be technical, in the space-time, time has a different metric from space.

There are two interesting points following from that. One is that you can imagine a universe where time is like space, not-directional. This is what Augustine of Hippo said 1,000 years ago to answer the question: why did God create the universe, considering that he's perfect and should not need a universe to feel even more perfect? Answer: there was not time before the universe was created, because is the universe which defines time, as Boltzmann explained. A similar argument is in "The Large Scale Structure of Space-Time" by Hawking and Ellis, when they say that there is a singularity in the past which "constitutes, in some sense, a beginning to the universe".

PS Note that things like a person jumping from ground floor to the 10th floor of a building are not time-defining. It appears to go against the flow past-to-future, but only because we know it's difficult for someone to have that strength. If the person is wonderwoman, we don't notice the movie is actually reversed, as the trajectory makes perfect sense.

View Answer on Quora

Posted in Entropy | Tagged , | Leave a comment

Mobile apps are a temporary step

(from Quora -- What does Marc Andreessen mean when he says "Mobile apps on platforms like iOS and Android are a temporary step along the way toward the full mobile web"?)

Let's have a look at the past, and see if we can understand the present and, within limits, forecast the future. This is, I believe, the only way to understand Andreessen.

Language -- creation and break-up

Apps today are distributed through many incompatible application stores. This reminds me the tower of Babel: great enthusiasm with the invention of language, but then people start talking different languages. If you look at any information revolution, not just the creation of language, the pattern has always been the same. Humanity creates a tool for exchanging information, and then breaks this tool into many tools, similar but yet incompatible between each other. It it neither good nor bad –it is just inevitable, and for good reasons (you need the language to be more specific in-between your community) which we shouldn't analyze here. The question is: does it sound familiar? Well, yes!

How Apple saved the publishers

Apple, and all the other corporations who followed it, gave new life to the web. Five years ago, newspapers, movies and music publishers were lamenting: "How will we survive if no one wants to pay for reading an article?".

A few years later, with the introduction of the App Store, every one got happy. Users can access information from their mobile, and publishers can easily charge for it. Soon, similar, but incompatible application stores appeared. Apple, rightfully, wants to defend its dominant position, and refuses to have others' application stores on the iPhone. Apple would like (it seems) to sell an iPhone to every single human being, so we can all buy on the App Store. This might be a bit too much, even for Apple.

Why applications have to change

If I were an enterpreneur or a developer, I didn't want Apple to impose on me its technology, even if it is a good technology. This is why I am waiting for html5 to be good enough to be used for serious stuff. Once I have my app developed in html5, I can sell it on any possible app-store --be Google Play, Amazon Store or "the" App Store. Maybe, the free version of my app will be available on the web.

In addition to that, as a user I don't want Apple to decide which app I can have and which I cannot. Government's sensorship is more than enough. This is more important than it may seem to Apple: without free exchange of information, humanity will go backward, not forward.

Apple and Manuzio, evolution of the revolution

And we will go forward. We are living in the middle of the biggest information revolution after the mobile types one. When modern press was invented. The revolution we are living will still deliver high quality, free information to humanity, exactly like books and newspapers did from Guthenberg on. If Netscape was Guthenberg, Apple is the new Aldus Manuzio, the Renaissance publisher who set up a definite scheme of book design, and introduced small and pocket editions (mobile editions...) of books. Manuzio became rich, but many followed his step, like many have followed Apple's step.

This revolution will also be remunerative! Following the Apple business model, some young publishers are already able to leverage on new tools for information exchange (books, movies, magazines, music etc). As publishers have to change their vision, many old publishers, AKA the dinosaurs, will die.

Last, we need someone pointing us to the right app/websites when looking for it, exactly like Google did 15 years ago. At the moment, apps are more like monads which hardly interact between each other. And in the absence of a network, it's hard to understand who's the leader... but we are already seeing huge efforts in this direction, and this will soon happen.

Like saying...

In conclusion, I think we can interpret Andressen like saying: "The web is alive, and is doing pretty well. Apps are good, but they are closed. This means they will soon change, and we will have application-like websites, able to pay salaries to journalists and musicians, but also accessible by everyone with a pocket size
computers, like Manuzio would call smart-phones.

PS The Tower of Bable, from Genesis:

And the whole earth [Internet] was of one language [http]. And they [the users] said: "Go to, let us build us a city and a tower, whose top may reach unto heaven; and ... lest we be scattered abroad upon the face of the whole earth. [Lest we call this the "World Wide Web"]

And the Lord [Corporations] said: "Behold, the people is one, and they have all one language; and this they begin to do: and now nothing will be restrained from them, which they have imagined to do." [or: we thought the web was a huge supermarket, but people started using it for exchanging movies and read news and books for free. How will we survive? And honestly, shouldn't people pay when using the product of the work of some one else?]

"Go to, let us go down, and there confound their language, that they may not understand one another's speech."

[And the application stores were created.]


Posted in Articles | Leave a comment