1

For a uni project I am working on a 24-hour ahead country electricity load forecasting model.

With open source data we already created a proper dataset with time series and corresponding load, and even some other parameters.

We got a recommendation to use the darts library to predict the load from this dataset but we are a bit lost in the big amount of forecasting features it has.

We look for a method on how to evaluate and compare these features their usefulness for the project we are doing, because our machine learning knowledge is on a beginner level.

If someone with experience on darts or energy forecast modeling in general could give me some insight on how to select the right features, I would be very grateful!

We used the darts documentation. There is elaboration on each of the models but it is hard to really compare them.

1 Answers1

0

I would suggest that you could approach the project using these techniques, in order:

  • Do some analysis using descriptive statistics like Pearson or Spearman’s correlation (really depends on what your data looks like). From there, you can get a sense of features you could get signal from vs not. It might also be a good idea to experiment with lagged features as well, when examining the correlations, to determine features that have an after-effect on what you are predicting.

  • Use the same kind of approach in terms of lagged vs current features with Granger causality tests. This is taking your analysis one step further, to then further grade the potential quality of your features.

  • From there, you might want to experiment with iterating over various combinations of feature sets and then storing the accuracy metrics, to see if in fact certain features in practice forecast better than others.

  • There are many different ways that you could do this but, to make it as easy as possible, you might want to categorize features into groups based on a combination of domain knowledge and the preliminary results from your analysis above; then, you are not taking the inefficient approach of doing an exhaustive search that would take forever to complete and would have very little reward in the end from a production standpoint. This way, you are identifying the features from these groups that have the most unique effect on your target variable and are avoiding issues of multicollinearity, which could interfere with the quality of your forecast (especially if many of your features are similar to each other and some might be better than others).

  • Once you have a search space that is much more reasonable in scope, you could do feature selection in 2 ways:

  1. Use a library like Optuna and treat features like unique categories that Optuna could store from trials and their accuracy metrics and see if certain categories are better than others.

  2. Transform your dataset into a dataset designed for supervised learning and use a feature selection method from something like sci-kit learn.

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83