I have a melted DataFrame that contains measurements for different sampleIDs and experimental conditions:
expcond variable value
0 0 Sample1 0.001620
1 1 Sample1 -0.351960
2 2 Sample1 -0.002644
3 3 Sample1 0.000633
4 4 Sample1 0.011253
... ... ... ... ...
293933 54 Sample99 0.006976
293934 55 Sample99 -0.002270
293935 56 Sample99 -0.498353
293936 57 Sample99 -0.006603
293937 58 Sample99 0.003283
I also have access to this data in non-melted form if it would be easier to handle it that way, but I doubt it.
Each sample is member of group. I have the groups stored in a separate file, which for the moment I am reading and storing as a dictionary. I would like to add a column "group" to my DataFrame based on this information. For the moment, I am doing it line by line, but that is quite slow given the ~300 000 entries:
final_ref_melt["group"] = ["XXX"] * len(final_ref_melt)
for i in range(len(final_ref_melt)):
final_ref_melt.loc[i, "group"] = ID_group[final_ref_melt.loc[i, "variable"]]
The end goal is then to separate the data into one DataFrame per group, then perform statistics calculations on each of them. With my current setup, I would do it like so:
final_ref_groups = {}
for mygroup in group_IDs.keys():
final_ref_groups[mygroup] = final_ref_melt[final_ref_melt["group"] == mygroup]
(Yes, I have the group information stored as two different dictionaries. I know.)
How can I do this more efficiently?