1

I have seen usually the suggested paradigm for using featuretools to create aggregations is to have separate entities linked by a relationship.

Now, my case is a bit different, I have a single table that looks like this

user_id | time_id | feature1_lag1 | feature1_lag2 | ... | feature1_lagn | feature2_lag1| ... | featurem_lagn

Basically I have lagged versions of the same features sitting as different columns of the same table.

Any way I can use featuretools in this case?

giobatta912
  • 180
  • 7

1 Answers1

1

It’s possible to use Featuretools directly on the single table with transform primitives. Supposing that you set the time_id as the time_index, every column will be valid for use only at that time index. The reason that might feel strange is that you have n columns occurring at m times.

By restructuring your dataset, you would be able to feed in that lag time information as well and even make some aggregations in the process. To get at that functionality, you’ll want to unpivot your data like so:

user_id  time_id      lag  feature_1  ...  feature_n
 1        2017-01-05   1    2.7             9.8
 1        2017-01-04   2    2.3       ...   9.1
 1        2016-01-01   m    0.0             0.0
 2        2017-01-05   1    18.1      ...   42.0
 .                     .                    .
 .                     .                    .
 23       2016-01-01   m    0.0       ...   0.6

Making an entity like this (we'll call it measurements here) lets you set a time index so that each lag has its own time. That lets you use data from that row at a time that's representative of reality.

Furthermore, you'd then be able to use normalize_entity on measurements to make a new parent entity from the user_id. That new entity, users, would then be the target entity for Deep Feature Synthesis if you want to make predictions by user.

Seth Rothschild
  • 384
  • 1
  • 14