Data Type
Time Sequence Data
The candidate types of features should be the same as the type of target feature. For example, temperature at time1, temperature at time2, and temperature of time3, the target feature should be temperature at next time point instead of other types of data.
Many rows (including multiple features) mapping on sigle target
In this case, the data in each feature should be aggregated by sum,average, max, min for numerical data; partition by function for categorical or nominal data (eg: SELECT name, hair_colour, COUNT(hair_colour) OVER (PARTITION BY name) FROM MyTable GROUP BY name, hair_colour; ).
process
Classification and Regression
To choose algorithms which have best performance in classification or regression.
Variables Selection
Forward
Backward
Stepwise
Feature Transformations
Now list several common techniques for feature transformations. Usually, it is helpful to combine some of these transformations (e.g., centering + scaling). In the following, we denote by the value of the feature over the training examples. Also, we denote by the empirical mean of the feature over all examples.
Centering
This transformation makes the feature have zero mean, by setting .
Unit Range
This transformation makes the range of each feature be . Formally, let and . Then, we set .
Standardization
This transformation makes all features have a zero mean and unit variance. Formally, let be the empirical variance of the feature. Then, we set .
Clipping
This transformation clips high or low values of the feature. For example, , where is a user-speciffed parameter.
Sigmoidal Transformation
As its name indicates, this transformation applies a sigmoid function on the feature. For example, , where is a user-speciffed parameter. This transformation can be thought of as a “soft” version of clipping: It has a small effect on values close to zero and behaves similarly to clipping on values far away from zero.
References
- tex refer to https://en.wikibooks.org/wiki/LaTeX/Mathematics.
- “Feature Transformations” Refer to the book of “Understanding Machine Learning:From Theory to Algorithms”.