Feature Selection

Data Type

Time Sequence Data

The candidate types of features should be the same as the type of target feature. For example, temperature at time1, temperature at time2, and temperature of time3, the target feature should be temperature at next time point instead of other types of data.

Many rows (including multiple features) mapping on sigle target

In this case, the data in each feature should be aggregated by sum,average, max, min for numerical data; partition by function for categorical or nominal data (eg: SELECT name, hair_colour, COUNT(hair_colour) OVER (PARTITION BY name) FROM MyTable GROUP BY name, hair_colour; ).

process

Classification and Regression

To choose algorithms which have best performance in classification or regression.

Variables Selection

Forward
Backward
Stepwise 

Feature Transformations

Now list several common techniques for feature transformations. Usually, it is helpful to combine some of these transformations (e.g., centering + scaling). In the following, we denote by the value of the feature over the training examples. Also, we denote by the empirical mean of the feature over all examples.

Centering

This transformation makes the feature have zero mean, by setting .

Unit Range

This transformation makes the range of each feature be . Formally, let and . Then, we set .

Standardization

This transformation makes all features have a zero mean and unit variance. Formally, let be the empirical variance of the feature. Then, we set .

Clipping

This transformation clips high or low values of the feature. For example, , where is a user-speciffed parameter.

Sigmoidal Transformation

As its name indicates, this transformation applies a sigmoid function on the feature. For example, , where is a user-speciffed parameter. This transformation can be thought of as a “soft” version of clipping: It has a small effect on values close to zero and behaves similarly to clipping on values far away from zero.

References

  • tex refer to https://en.wikibooks.org/wiki/LaTeX/Mathematics.
  • “Feature Transformations” Refer to the book of “Understanding Machine Learning:From Theory to Algorithms”.