dataFrame.columns.stat — distance()

Description

The distance() method of the stat object computes the distance between each row and a reference vector.

Signature

dataFrame.columns(...columnNames).stat.distance(values, { method: 'euclidean' })
Scope
columns
Family
stat
Returns
mask

Arguments

...columnNames ( string[] )
The name of the columns from which to compute distances.
values (array)
Reference values used to compute distances.
options (object)
Distance computation options.

Option

method (string)
The distance metric used for the computation.
  • euclidean (default)
  • hamming

Returns

mask ( number[] )
An array containing the computed distance for each row.

Notes

  • The method computes one distance value per row.
  • The number of supplied values must match the number of selected columns.
  • Euclidean distance is typically used for numerical data.
  • Hamming distance counts the number of differing values between observations.
  • Smaller values indicate observations closer to the reference vector.
  • A distance of 0 indicates an exact match.

Example

// evaluate the distance between the values of 2 columns of the dataFrame and a given set of values
var mask = dataFrame.columns('age', 'income').stat.distance([35, 50000]);

// add the distances into the dataFrame
dataFrame.column('distance').set(mask);