I have a pandas dataframe like so :
cid code max date
1 A 32 date1
1 B 9 date2
1 C 25 date3
2 A 33 date4
2 B 11 date5
Basically, for every CID there might be N number of entries and N varies for each CID, for some it might be 1, 2 for some it might be 3 or more. I want to concatenate all rows having the same CID. I know some columns will end up empty for some IDs since their 'N' will be lower as compared to the N of other CIDs so I want to fill out -1 for those empty columns
I ran the following to group the dataframe by "cid" column :
maxscoredf = maxscoredf.set_index(['cid',maxscoredf.groupby('cid').cumcount().add(1)])
When I try to unstack using
maxscoredf = maxscoredf.unstack(fill_value = -1) #Memory Error. requires 221GB RAM
How do I circumvent this memory error ? The goal is to get all values for the same cid
in the same row like so :
id code1 mean1 count1 code2 mean2 count2 code3 mean3 count3
1 A 32 22 B 9 56 C 25 78
2 A 33 35 B 11 66 -1 -1 -1
With any missing values substituted by -1 in the dataframe.
Using code in this answer : https://stackoverflow.com/a/66009708/6916919
Pandas version : 0.21, Using this specific version because https://stackoverflow.com/a/61757908/6916919
Please ask for any additional info that might be required