0

I'm attempting to use FeatureTools to normalize a table for feature synthesis. My table is similar to Max-Kanter's response from How to apply Deep Feature Synthesis to a single table. I'm hitting an exception I would appreciate some help working around.

The exception originates in featuretools.entityset.entity.entityset_convert_variable_type, which doesn't seem to handle time types.

What is the nature of the exception, and can I work around it?

The Table, df:

PatientId | AppointmentID | Gender | ScheduledDay | AppointmentDay | Age | Neighbourhood | Scholarship | Hipertension | Diabetes | Alcoholism | Handcap | SMS_received | No-show
12345     | 5642903       | F     | 2016-04-29    | 2016-04-29     | 62  | JARDIM DA     | 0           | 1            | 0        | 0          | 0       | 0            | No
67890     | 3902943       | M     | 2016-03-18    | 2016-04-29     | 44  | Other Nbh     | 1           | 1            | 0        | 0          | 0       | 0            | Yes
...

My Code:

appointment_entity_set = ft.EntitySet('appointments')
appointment_entity_set.entity_from_dataframe(
    dataframe=df, entity_id='appointments',
    index='AppointmentID', time_index='AppointmentDay')

# error generated here
appointment_entity_set.normalize_entity(base_entity_id='appointments',
    new_entity_id='patients',
    index='PatientId')

ScheduledDay and AppointmentDay are type pandas._libs.tslib.Timestamp as is the case in Max-Kanter's response.

The Exception:

~/.virtualenvs/trane/lib/python3.6/site-packages/featuretools/entityset/entity.py in entityset_convert_variable_type(self, column_id, new_type, **kwargs)
    474         df = self.df
--> 475         if df[column_id].empty:
    476             return
    477         if new_type == vtypes.Numeric:

Exception: Cannot convert column first_appointments_time to <class 'featuretools.variable_types.variable.DatetimeTimeIndex'>

featuretools==0.1.21

This dataset is from the Kaggle Show or No Show competition

alacarter
  • 349
  • 2
  • 11

1 Answers1

3

The error that’s showing up seems to be a problem with the way the AppointmentDay variable is being read by pandas. We actually have an example Kaggle kernel with that dataset. There, we needed to use pandas.read_csv with parse_dates:

data = pd.read_csv("data/KaggleV2-May-2016.csv", parse_dates=['AppointmentDay', 'ScheduledDay'])

That returns a pandas Series whose values are of type numpy.datetime64. This should load in fine to Featuretools.

Also, can you make sure you have the latest version of Featuretools from pip? There is a set trace command in that stack trace that isn’t in the latest release.

Max Kanter
  • 2,006
  • 6
  • 16
  • 1
    Thank you, @max-kanter. That was perfect. Also, thanks for pointing out the example. I had no idea. I deleted the `pdb` line. Sorry about that. I'd introduced it locally to see what was going on. – alacarter Jun 15 '18 at 14:10