A common question is if entropy and information theory are related. The answer is yes. How? And how are they related in graph theory?
Von Neumann explains the relationship between entropy and information in a crystalline definition: "Entropy is the difference between the information provided by the macroscopic description and the microscopic one". In layman terms, entropy, he says, is the difference between what we know when look from afar and when we look close.
But what about physics? Physics is all about making mathematical models, and Von Neumann says the entropy of a system measures how much information the modelisation of the system does not provide. With microscopic description we mean a description of the system which coincides with the best possible one –the one which allows us to identify each element, and to know everything about it. If the system (in our case a graph) provides us with the same amount of information, its entropy is zero. If not, the entropy is bigger than zero.
Before going to graphs, let us consider a relatively simple system, like a perfect gas (point-like particles interacting elastically).
The best microscopic description of a gas is the one which gives, for each labelled particle, the exact position and speed. If a little demon could provide us a table with position and speed of all particles, we would have zero entropy. (Boltzmann himself uses the imaginary capability of identifying each particle in the ergodic principle, to eventually give a statistical definition of entropy, and soon after Maxwell introduces the knowledgeable demon's paradox).
As found by Maxwell, a perfect gas is potentially a zero-entropy system. Given its boundary conditions (position and speed of each particle), we could uniquely identify each particle and follow their evolution, and extract all desired energy from it.
Something similar can be applied to graphs. Only, it gets a bit more complex. While a gas is perfectly defined by the properties of its constituents –the pointlike particles– a graph is not.
Graphs are defined by the relationships between its constituents. In fact, we use graphs to describe systems which show an emergent properties: the information stored in the whole structure is bigger than the sum of the information stored in each element. This means that we need to look at the whole structure, not at its elements one by one, as we do with gas!
To describe a node in a graph, we could use its number of connections, its centrality, its average distance from all nodes, clusterization of its connections et cetera. But not necessarily these properties will univocally describe that node. How can we understand how well a node of a graph is defined by the structure of the graph (aka the topology)?
To understand that, let us start having a look at the most trivial graph: a triangle.
However precisely one describes a node using the properties of the graph, it is impossible to provide enough information to identify it. All nodes are the same –they have two connected nodes, and those connected nodes are connected between themselves by a single link.
More formally, we say that any of the six permutations of the nodes will produce the same graph. Even more formally, there are six adjacency preserving bijections, and the set of these bijections form the automorphism group (a transformation onto itself) of the graph. Graphically, this means that if someone swaps the labels A and B, and "fixes" the graph, in this case rotating the triangle around the high, we would not notice that A and B are now different nodes than earlier. The graph is symmetric under that transformation.
We can represent that through matrices. The adjacency matrix of the triangle is:
A B C
A 0 1 1
B 1 0 1
C 1 1 0
It's clear that if you swap the first two columns and the first two rows, i.e. swap node A with node B, you get the same matrix:
B A C
B 0 1 1
A 1 0 1
C 1 1 0
In practice, this is a "high entropy" graph, because does not provide much information about its constituents –we better add a label to them, as the graph's topology does not help!
Let us take now a more funny, and complex, graph. One with directed connections, and loops, which says "C talked to itself, and A talked twice to B". Graphically:
The adjacency graph of the graph is the following:
A B C
A 0 2 0
B 0 0 0
C 0 0 1
It is evident that there are no permutations which would leave the matrix unchanged. In this case descriptions like "the node with two incoming connections", or "two outcoming", or "in a loop", would leave no doubt about which node we are talking about. The macroscopic description, through the graph, is the same as the microscopic one –with the label. This is a zero-entropy graph!
How well we can identify nodes is therefore related to entropy: in this sense, complex graphs, where the connections are particularly "telling", are very informative of their constituents, and store more information than simple ones.
We can now compute the entropy according to von Neumann. The uncertainty of a microscopic description (the labelled one) is zero. We therefore only need to compute the uncertainty on the macroscopic one.
In our simple non-directed triangular graph, we have six possible permutations of the nodes, therefore:
We could also see at it in a different way. The uncertainty on the first node is ?. Once we know the identity of the first node, the uncertainty on the second is ½, and then the identity of the third node is determined –therefore .
Nonetheless, we are normally not interested in identifying each node in the graph, but would be happy with less. If you are interested in a certain characteristic of the nodes, and not in identifying them, you can simply refer to the entropy of the probability density function of that characteristic. Say, for instance, you are only interested in knowing the number of connections for each node. As often happen, you don't know the number of connections of each node, but you know that the probability of a node having connections is . In this case, the entropy will be: