Data preprocessing

From stats++ wiki
Jump to: navigation, search

Data preprocessing typically involves (some of) the following steps:

  1. Data cleaning
  2. Feature scaling
    1. Normalization
    2. Standardization
  3. Decorrelation and whitening
  4. Dimensionality reduction

"Efficient BackProp" paper by LeCun, describes the strategy of pre-processing data as:

  1. shift the data so that its mean is 0
  2. de-correlate the data
  3. normalize the data so each input has a variance of ~1

stats++

In stats++, data preprocessing is handled through a Preprocessor object:

Preprocessor