As quite a newbie to pandas, I'm struggling with a data arrangement issue.
I've got a huge pile of data from a log file in a pandas dataframe with a structure like this:
day user measure1 measure2 ...
1 u1 xxxxx yyyyy ...
1 u2 xxxxx yyyyy ...
1 u3 xxxxx yyyyy ...
2 u2 xxxxx yyyyy ...
2 u4 xxxxx yyyyy ...
2 u3 xxxxx yyyyy ...
3 u1 xxxxx yyyyy ...
3 u3 xxxxx yyyyy ...
... ... ... ... ...
Hence, not every user appears at each day, while the data is neither sorted by day nor by user. However, if an entry occurs, is has all the measures.
Now I need to rearrange this data to obtain a 2D table "every user" vs. "every day" for each measure and fill the gaps with zeros e.g.
For measure1: For measure2:
u1 u2 u3 u4 u1 u2 u3 u4
1 xxxxx xxxxx xxxxx 0 1 yyyyy yyyyy yyyyy 0
2 0 xxxxx xxxxx xxxxx 2 0 yyyyy yyyyy yyyyy
3 xxxxx 0 xxxxx 0 3 yyyyy 0 yyyyy 0
How can I do this with pandas in python3? I'm also open to alternative solutions e.g. using numpy instead of pandas.
So far I managed to extract arrays of all occurring users and days in the dataset but have no clue how to smartly assign the measured data.
I'm grateful for any help on this matter.