I have a pandas DataFrame df
that looks like:
df =
sample col1 data_value time_stamp
A 1 15 0.5
A 1 45 0.5
A 1 32 0.5
A 2 3 1
A 2 57 1
A 2 89 1
B 1 10 0.5
B 1 20 0.5
B 1 30 0.5
B 2 12 1
B 2 24 1
B 2 36 1
For a given sample and its respective column, I am trying to condense all data values into a numpy array in a new column merged_data
to look like:
sample col1 merged_data time_stamp
A 1 [15, 45, 32] 0.5
A 2 [3, 57, 89] 1
B 1 [10, 20, 30] 0.5
B 2 [12, 24, 36] 1
I've tried using df['merged_data] = df.to_numpy()
operations and df['merged_data'] = np.array(df.iloc[0:2, :].to_numpy()
, but they don't work. All elements in the merged_data
column need to be numpy arrays or lists (can easily convert between the two).
Lastly, I need to retain the time_stamp
column for each combination of sample
and col
. How can I include this with the groupby
?
Any help or thoughts would be greatly appreciated!