dataFrame.columns.stat — giniImpurity()

Description

The giniImpurity() method of the stat object computes the Gini impurity of the two selected categorical columns.

Signature

dataFrame.columns(...columnNames).stat.giniImpurity()
Scope
columns
Family
stat
Returns
number

Argument

...columnNames ( string[] )
The name of the columns from which to compute the Gini impurity.

Returns

impurity (number)
The computed Gini impurity.

Notes

  • The method requires exactly two selected columns.
  • The method computes the probability of incorrectly classifying a randomly selected observation when labels are assigned according to the observed class distribution.
  • The Gini impurity ranges from 0 to 1.
  • A value of 0 indicates that all observations belong to the same class.
  • Larger values indicate greater class heterogeneity.
  • The maximum value depends on the number of distinct classes.
  • The measure is commonly used in decision tree algorithms.
  • For a binary distribution, the maximum Gini impurity is 0.5.

Example

// calculate the Gini impurity for the values of 2 categorical columns of the dataFrame
var impurity = dataFrame.columns('Gender', 'Speed_of_Impact').stat.giniImpurity();

// log the impurity
notebook.log(impurity);