I would like in sklearn package, Find the gini coefficients for each feature on a class of paths such as in iris data. like Iris-virginica Petal length gini:0.4 ,Petal width gini:0.4.
Asked
Active
Viewed 1.3k times
6
-
Can you post the data on which you want to find the gini ? – seralouk Jul 14 '17 at 06:46
-
1`from sklearn import datasets iris = datasets.load_iris() `u can use this code download data – Ming Jul 14 '17 at 10:41
-
Don't confuse Gini coefficient and Gini impurity. This [article](https://www.learndatasci.com/glossary/gini-impurity/) shows a very comprehensive python implementation of the latter. – Woodly0 Jan 16 '23 at 08:31
1 Answers
5
You can calculate the gini coefficient with Python+numpy like this:
from typing import List
from itertools import combinations
import numpy as np
def gini(x: List[float]) -> float:
x = np.array(x, dtype=np.float32)
n = len(x)
diffs = sum(abs(i - j) for i, j in combinations(x, r=2))
return diffs / (2 * n**2 * x.mean())

Martin Thoma
- 124,992
- 159
- 614
- 958
-
This is one of the best Gini implementations in Python that I've seen :-D. I love it because there are a lot of alternative formulas out there, but if you look around this is the most agreed upon and consistent Gini formula you'll see in literature. The issue is that it's hard to implement this formula, and yet here it is in just 4 lines of code. Well done!! A+ – yeamusic21 Sep 30 '20 at 20:54
-
1I might have spoke too soon. I was comparing this to some other work (https://stackoverflow.com/questions/39512260/calculating-gini-coefficient-in-python-numpy) and I wonder if you're over estimating n here. We want the mean absolute different, and your n is > the number of mean absolute differences that you calculate (from what I can see). – yeamusic21 Sep 30 '20 at 21:36