dataFrame.columns.stat — giniImpurity()
Description
The giniImpurity() method of the stat object computes the Gini impurity of the two selected categorical columns.
Signature
dataFrame.columns(...columnNames).stat.giniImpurity()Argument
...columnNames( string[] )- The name of the columns from which to compute the Gini impurity.
Returns
impurity(number)- The computed Gini impurity.
Notes
- The method requires exactly two selected columns.
- The method computes the probability of incorrectly classifying a randomly selected observation when labels are assigned according to the observed class distribution.
- The Gini impurity ranges from
0to1. - A value of
0indicates that all observations belong to the same class. - Larger values indicate greater class heterogeneity.
- The maximum value depends on the number of distinct classes.
- The measure is commonly used in decision tree algorithms.
- For a binary distribution, the maximum Gini impurity is
0.5.
Example
// calculate the Gini impurity for the values of 2 categorical columns of the dataFrame
var impurity = dataFrame.columns('Gender', 'Speed_of_Impact').stat.giniImpurity();
// log the impurity
notebook.log(impurity);