0

I have a melted DataFrame that contains measurements for different sampleIDs and experimental conditions:

    expcond variable    value
0       0   Sample1     0.001620
1       1   Sample1     -0.351960
2       2   Sample1     -0.002644
3       3   Sample1     0.000633
4       4   Sample1     0.011253
...     ...     ...     ...     ...
293933  54  Sample99    0.006976
293934  55  Sample99    -0.002270
293935  56  Sample99    -0.498353
293936  57  Sample99    -0.006603
293937  58  Sample99    0.003283

I also have access to this data in non-melted form if it would be easier to handle it that way, but I doubt it.

Each sample is member of group. I have the groups stored in a separate file, which for the moment I am reading and storing as a dictionary. I would like to add a column "group" to my DataFrame based on this information. For the moment, I am doing it line by line, but that is quite slow given the ~300 000 entries:

final_ref_melt["group"] = ["XXX"] * len(final_ref_melt)
for i in range(len(final_ref_melt)):
    final_ref_melt.loc[i, "group"] = ID_group[final_ref_melt.loc[i, "variable"]]

The end goal is then to separate the data into one DataFrame per group, then perform statistics calculations on each of them. With my current setup, I would do it like so:

final_ref_groups = {}

for mygroup in group_IDs.keys():
    final_ref_groups[mygroup] = final_ref_melt[final_ref_melt["group"] == mygroup]

(Yes, I have the group information stored as two different dictionaries. I know.)

How can I do this more efficiently?

Whitehot
  • 383
  • 3
  • 19
  • 1
    Looks like a `merge`/`map`, please provide the clear format of your dictionary if the duplicate doesn't suit your needs (make sure to read it in detail) – mozway May 03 '23 at 13:02
  • 1
    @mozway yep, needed to juggle the dictionary around a bit but got there in the end. TY – Whitehot May 03 '23 at 14:09

0 Answers0