3

How can I select top n features of time series using tsfresh? Can I decide the number of top features I want to extract?

Chaitra
  • 23
  • 1
  • 7
  • Did you already solve it? If not, what did you try so far? – flyingdutchman Dec 21 '20 at 19:30
  • 1
    @flyingdutchman my approach to this was to calculate the relevance table using the tsfresh.feature_selection.relevance module. It gave a list of relevant features that are calculated using the Benjamini Hochberg procedure which is a multiple testing procedure that decides which features to keep and which to cut off (solely based on the p-values). I took the features are the top as the most relevant features. You can refer to the following link: https://tsfresh.readthedocs.io/en/latest/api/tsfresh.feature_selection.html# – Chaitra Dec 23 '20 at 09:49
  • Ah, good to know, this is the same what I did for my problem. I will just put this as an answer, so that this question is answered. – flyingdutchman Dec 23 '20 at 11:01

1 Answers1

3

Based on the above comment from @Chaitra and this answer I give an answer.

You can decide the number of top features by using the tsfresh relevance table described in the documentation. You can then sort the table by the p-value and the the top n features.

Example code printing top 11 features:

from tsfresh import extract_features
from tsfresh.feature_selection.relevance import calculate_relevance_table

extracted_features = extract_features(
    X,
    column_id="id",
    column_kind="kind",
    column_value="value",
)
relevance_table = calculate_relevance_table(extracted_features, y)
relevance_table = relevance_table[relevance_table.relevant]
relevance_table.sort_values("p_value", inplace=True)
print(relevance_table["feature"][:11])
Catalin
  • 3
  • 2
flyingdutchman
  • 1,197
  • 11
  • 17