`dataFrame.columns.stat — giniImpurity()`

Description

The giniImpurity() method of the stat object computes the Gini impurity of the two selected categorical columns.

Signature

dataFrame.columns(...columnNames).stat.giniImpurity()

Scope: columns
Family: stat
Returns: number

Argument

...columnNames ( string[] ): The name of the columns from which to compute the Gini impurity.

Returns

impurity (number): The computed Gini impurity.

Notes

The method requires exactly two selected columns.
The method computes the probability of incorrectly classifying a randomly selected observation when labels are assigned according to the observed class distribution.
The Gini impurity ranges from 0 to 1.
A value of 0 indicates that all observations belong to the same class.
Larger values indicate greater class heterogeneity.
The maximum value depends on the number of distinct classes.
The measure is commonly used in decision tree algorithms.
For a binary distribution, the maximum Gini impurity is 0.5.

Example

// calculate the Gini impurity for the values of 2 categorical columns of the dataFrame
var impurity = dataFrame.columns('Gender', 'Speed_of_Impact').stat.giniImpurity();

// log the impurity
notebook.log(impurity);