dataFrame.columns.stat — classSeparation()
Description
The classSeparation() method of the stat object evaluates how well one or more feature columns separate the observed classes.
Signature
dataFrame.columns(...columnNames).stat.classSeparation()Arguments
...columnNames( string[] )- The name of the columns from which to compute the class separation metrics.
Returns
stat(object)- A statistic object containing class separation metrics.
SSW(number)- The within-class sum of squares.
SSB(number)- The between-class sum of squares.
SST(number)- The total sum of squares.
R2(number)- The proportion of variance explained by the class structure.
Fisher(number)- The Fisher discriminant statistic.
F_norm(number)- The normalized Fisher statistic.
score(number)- The overall class separation score.
Notes
- The method requires at least two selected columns.
- For a univariate analysis, the first selected column is interpreted as the feature and the second selected column as the class label.
- For a multivariate analysis, the last selected column is interpreted as the class label and all preceding columns are interpreted as features.
- The method evaluates how effectively the selected features distinguish the observed classes.
SSWmeasures the variability within classes.SSBmeasures the variability between classes.SSTmeasures the total variability and satisfiesSST = SSW + SSB.R2represents the proportion of variance explained by the class structure.Fishermeasures the ratio of between-class variability to within-class variability.F_normis computed asFisher / (Fisher + 1)and ranges from0to1.scoreis computed asR2 × F_normand ranges from0to1.- Higher scores indicate stronger separation between classes.
- A score close to
0indicates little or no class separation. - A score close to
1indicates strong class separation. - The reported statistics are generalized to support multivariate feature spaces. They are computed from the class structure defined by the label column and may differ from the values returned by ANOVA-based methods, where groups are defined by the selected columns.
Example
// computes the class separation measures from 2 columns of a dataFrame
var stat = dataFrame.columns('visitors', 'conversions').stat.classSeparation();
// log the stat details
notebook.log(stat);