How can I flatten a pandas data frame with several rows with the same id

Question

How can I flatten a pandas dataframe like the following:

id               date            var1
058a219119825  2015-01-01      0.9             
058a219119825  2015-02-01      0.3         
058a219119825  2015-03-01      0.1
....
058a219119826  2015-01-01      0.1
058a219119826  2015-02-01      0.5
058a219119826  2015-03-01      0.4

Some info about the data frame: There is the following number of unique entries (id's) with dates:

  date       number of unique id's
2015-01-01    16070
2015-02-01    16082
2015-03-01    16074
2015-04-01    16079
2015-05-01    16080
2015-06-01    16085
2015-07-01    16090
2015-08-01    16094
2015-09-01    16082
2015-10-01    16085
2015-11-01    16087
2015-12-01    16094

I want something similar as this command does with json files:

flattened = (flatten(entry) for entry in json_data)

The thing is that now, I have the data in dataframes. An idea that I have is to create a new column with the var and date, and then delete date column. For instance:

id           var1_2015-01-01 var1_2015-02-01 var1_2015-03-01
058a219119825        0.9             0.3           0.1

Besides, as some of the ids will not have all the 12 different dates (1 per month) I was thinking to add a "missing value" string for those missing var1 values in non-exist dates. How can I do that with Pandas?

What should be the value of date and var1 in the new dataframe? — codeape, Sep 20 '18 at 20:39
Are you looking to [pivot a DataFrame](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe)? — ALollz, Sep 20 '18 at 20:44

score 3 · Accepted Answer · answered Sep 20 '18 at 21:01

3

You can pivot tables in Pandas.

For example:

entry.pivot(index='id', columns='date', values='var1')

answered Sep 20 '18 at 21:01

Ricky Kim

1,992
1
9
18

Super clever! Thanks a lot! :D – Javiss Sep 20 '18 at 21:04
any idea how can I keep the 'id' as a column? @RickyKim – Javiss Sep 20 '18 at 21:34
I got it: entry.pivot(index='id', columns='date', values='var1').reset_index('id') – Javiss Sep 20 '18 at 21:44

How can I flatten a pandas data frame with several rows with the same id

1 Answers1