How can I select top n features of time series using tsfresh? Can I decide the number of top features I want to extract?
Asked
Active
Viewed 2,617 times
3
-
Did you already solve it? If not, what did you try so far? – flyingdutchman Dec 21 '20 at 19:30
-
1@flyingdutchman my approach to this was to calculate the relevance table using the tsfresh.feature_selection.relevance module. It gave a list of relevant features that are calculated using the Benjamini Hochberg procedure which is a multiple testing procedure that decides which features to keep and which to cut off (solely based on the p-values). I took the features are the top as the most relevant features. You can refer to the following link: https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_selection.html# – Chaitra Dec 23 '20 at 09:49
-
Ah, good to know, this is the same what I did for my problem. I will just put this as an answer, so that this question is answered. – flyingdutchman Dec 23 '20 at 11:01
1 Answers
3
Based on the above comment from @Chaitra and this answer I give an answer.
You can decide the number of top features by using the tsfresh
relevance table described in the documentation. You can then sort the table by the p-value and the the top n
features.
Example code printing top 11 features:
from tsfresh import extract_features
from tsfresh.feature_selection.relevance import calculate_relevance_table
extracted_features = extract_features(
X,
column_id="id",
column_kind="kind",
column_value="value",
)
relevance_table = calculate_relevance_table(extracted_features, y)
relevance_table = relevance_table[relevance_table.relevant]
relevance_table.sort_values("p_value", inplace=True)
print(relevance_table["feature"][:11])

Catalin
- 3
- 2

flyingdutchman
- 1,197
- 11
- 17