I have a dataset with unique names. Another dataset contains several rows with the same names as in the first dataset.
I want to create a column with unique ids in the first dataset and another column in the second dataset with the same ids corresponding to all the same names in the first dataset.
For example:
Dataframe 1:
player_id Name
1 John Dosh
2 Michael Deesh
3 Julia Roberts
Dataframe 2:
player_id Name
1 John Dosh
1 John Dosh
2 Michael Deesh
2 Michael Deesh
2 Michael Deesh
3 Julia Roberts
3 Julia Roberts
I want to do to use both data frames to run deep feature synthesis using featuretools. To be able to do something like this:
entity_set = ft.EntitySet("basketball_players")
entity_set.add_dataframe(dataframe_name="players_set",
dataframe=players_set,
index='name'
)
entity_set.add_dataframe(dataframe_name="season_stats",
dataframe=season_stats,
index='season_stats_id'
)
entity_set.add_relationship("players_set", "player_id", "season_stats", "player_id")