Wikipedia’s article on the Gini index says that it satisfies the transfer principle, i.e. that transfering income from a richer person to a poorer person will only decrease the index. I was wondering myself about a slightly different question – whether calculating income inequality between averages of groups will necessarily result in a lower index than calculating it for all the people in all the groups together (the answer is yes).
The question is whether the Gini index satisfies a condition stronger than both the transfer principle and my question. Suppose that data set X has a Gini index a1, and that Y is a subset of X with Gini index b1. Suppose also that Y’ has the same number of data points and average as Y, and a Gini index b2, and X’, which is comprised of X\Y and Y’ (i.e. X, with Y replaced with Y’), has a Gini index a2. Does b2 > b1 imply a2 > a1?
From that, it would follows that b2 < b1 implies a2 < a1, whence both weaker conditions follow – for the transfer principle, let Y consist of the richer and poorer person, and for my question, let Y consist of one group, make it entirely equal, and then continue by letting Y be another group.
I know how to prove both weaker conditions. For those, we’ll need a lemma that provides an equivalent definition of the Gini index. The standard one is based on ordering all data points in ascending order, and then construct the Lorenz curve, defined as the share of the total income of the first t% of the data points. Each data point corresponds to a straight line segment of the Lorenz curve of length 1/n, where n is the size of the data set, and of slope x(i)/m, where m is the mean income and x(i) is the income of the data point.
Now, we can construct a Lorenz curve for any ordering of the data set X. If X is sorted in decreasing order, then the Lorenz curve will be just be a 180 degree rotation of the normal Lorenz curve, and the Gini index, i.e. the area under the perfect equality line and above the Lorenz curve expressed as a fraction of the area under the perfect equality line, will be the negative of the normal Gini index.
Similarly, a random ordering can be expected to have no pattern of income distribution, so its Lorenz curve will straddle the perfect equality line, and its Gini index will be 0.
In other words, there are many Gini indices of the same data set X. But in fact, the normal Gini index is equal to the highest possible Gini index of any ordering of X. To see why, suppose that x(i) > x(i+1), and examine the ith and (i+1)st line segments of the Lorenz curve. The ith line segment will have higher slope, so if L(i-1) is the height of the left edge of the ith segment, i.e. L(i-1) = (x1 + x2 + … + x(i-1))/mn, then the contribution of the ith line segment to the Gini index is ((i-1)/n – L(i-1) + i/n – L(i))/2n, and this of the (i+1)st is (i/n – L(i) + (i+1)/n – L(i+1))/2n. Also, note that L(i) = L(i-1) + x(i)/mn.
So interchanging x(i) and x(i+1) without making any other change will not make any change to the Gini index except replace L(i-1) + x(i)/mn with L(i-1) + x(i+1)/mn, a lower quantity since x(i) > x(i+1). This in turn will make
((i-1)/n – L(i-1) + i/n – L(i))/2n and (i/n – L(i) + (i+1)/n – L(i+1))/2n bigger, which will increase the Gini index.
To sort X in ascending order, all that’s needed is to compare pairs of adjacent points and interchange them if the latter one is smaller. There’s a sorting algorithm based on just that, Bubblesort: first compare x1 and x2 and interchange if needed, then x2 and x3, then x3 and x4… then x(n-1) and x(n). That will guarantee that the largest number is in the nth position, so the next iteration should start at x1-x2 and end at x(n-2)-x(n-1), putting the second largest number in the (n-1)st position. After n iterations, the list is sorted.
With the lemma proved, suppose that X is sorted in ascending order, and that Y is a subset of X that has an equal distribution. Then making the distribution of Y unequal will kink the Lorenz curve of X’ with respect to the sorting inherited from X downward; the curve will be unchanged to the left and to the right of Y, but in Y it will move down, ensuring that the Gini index of X’ with respect to that ordering is higher than a1. It immediately follows that a2 > a1, then.
For the transfer principle, transfering money from a poorer person i to a richer person j without changing the ordering will increase the contribution of i and j to the Gini index slightly, since L(i) and L(j-1) will both decrease. It will similarly increase the contribution of every point between i and j, since in fact every L with index between i and j-1 inclusive will decrease.
The more general theorem/conjecture doesn’t follow from the transfer principle, despite appearances. Not every increase in inequality can be generated by transfering money from the poor to the rich. If the top 1% all have incomes at 5m and the rest have incomes at 95m/99, then the Gini index is 0.05, but if the top 50% all have incomes at 2m and the bottom 50% have no income, a situation that involves not only upward but also downward redistribution of income, then Gini rises to 0.5.
However, if Y is contiguous – i.e. if Y spans all data points between i and j inclusive with respect to the ascending ordering – then for the same reasons my original question’s answer is yes, any increase in the Gini of Y will increase the Gini of X. However, it doesn’t immediately follow that any decrease in the Gini of Y will decrease the Gini of X, since the decrease may be accompanied by an increase of income at the top of Y or a decrease at the bottom of Y, which will require a change in sorting that might make X’ more unequal than X.