`dataFrame.columns.stat — classSeparation()`

Description

The classSeparation() method of the stat object evaluates how well one or more feature columns separate the observed classes.

Signature

dataFrame.columns(...columnNames).stat.classSeparation()

Scope: columns
Family: stat
Returns: object

Arguments

...columnNames ( string[] ): The name of the columns from which to compute the class separation metrics.

Returns

stat (object)

A statistic object containing class separation metrics.

SSW (number): The within-class sum of squares.
SSB (number): The between-class sum of squares.
SST (number): The total sum of squares.
R2 (number): The proportion of variance explained by the class structure.
Fisher (number): The Fisher discriminant statistic.
F_norm (number): The normalized Fisher statistic.
score (number): The overall class separation score.

Notes

The method requires at least two selected columns.
For a univariate analysis, the first selected column is interpreted as the feature and the second selected column as the class label.
For a multivariate analysis, the last selected column is interpreted as the class label and all preceding columns are interpreted as features.
The method evaluates how effectively the selected features distinguish the observed classes.
SSW measures the variability within classes.
SSB measures the variability between classes.
SST measures the total variability and satisfies SST = SSW + SSB.
R2 represents the proportion of variance explained by the class structure.
Fisher measures the ratio of between-class variability to within-class variability.
F_norm is computed as Fisher / (Fisher + 1) and ranges from 0 to 1.
score is computed as R2 × F_norm and ranges from 0 to 1.
Higher scores indicate stronger separation between classes.
A score close to 0 indicates little or no class separation.
A score close to 1 indicates strong class separation.
The reported statistics are generalized to support multivariate feature spaces. They are computed from the class structure defined by the label column and may differ from the values returned by ANOVA-based methods, where groups are defined by the selected columns.

Example

// computes the class separation measures from 2 columns of a dataFrame
var stat = dataFrame.columns('visitors', 'conversions').stat.classSeparation();

// log the stat details
notebook.log(stat);