Best way to take the values of a column that are all related to multiple rows with the same index

Question

I'm currently try to take accelerometer data samples and combine them into a single dataframe for use with machine learning. There are 100 entries per sample saved to csv's and each gets loaded in through a for loop and the merged through a for loop. After that my results look something like this:

ID	X	Y	Z	Class
ID 1	4	2	6	Action 1
ID 1	4	2	5	Action 1
ID 2	3	4	4	Action 2
ID 2	2	2	1	Action 2
ID 3	1	4	5	Action 3
ID 3	2	3	2	Action 3
ID 4	2	1	5	Action 1
ID 4	2	2	5	Action 1

I'm wanting to get a wide format table to make the data easier to use with machine learning. I'm trying to get it to look more along the lines of this second table:

ID	X1	Y1	Z1	X2	Y2	Z2	Class
ID 1	4	2	6	4	2	5	Action 1
ID 2	3	4	4	2	2	1	Action 2
ID 3	1	4	5	2	3	2	Action 3
ID 4	2	1	5	2	2	5	Action 1

I've looked into melts, stacking and unstacking, groupby options, and for loops where I try to move everything over. Nothing has gotten me close to what I've been trying to accomplish. Just looking for advice on how to proceed or if there is a better method.

score 1 · Accepted Answer · answered Jul 11 '23 at 19:45

1

Try:

df['tmp'] = df.groupby('ID').cumcount() + 1

df = df.pivot(index=['ID', 'Class'], columns=['tmp'])
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.reset_index()

print(df)

Prints:

     ID     Class  X1  X2  Y1  Y2  Z1  Z2
0  ID 1  Action 1   4   4   2   2   6   5
1  ID 2  Action 2   3   2   4   2   4   1
2  ID 3  Action 3   1   2   4   3   5   2
3  ID 4  Action 1   2   2   1   2   5   5

answered Jul 11 '23 at 19:45

Andrej Kesely

168,389
15
48
91

1

Went through and replaced variables as necessary, reset and ran, worked perfect. Thank you greatly. – Grimsbear Jul 11 '23 at 20:11

score 0 · Answer 2 · answered Jul 11 '23 at 19:46

If df1 and df2 have the same structure and you want merge into large wide df you could give the columns unique name each time a df1, df2... dfn is generated... like this...

Lets say a temporary df is created in the loop each time but in the end you want a large one which contains all data from the others...

merged_df = pd.DataFrame()

# go into loop
for file in os.listdir(directory_path):
   temp_df = pd.read_csv(directory_path + '/' + file)
   temp_df.add_suffix('_unique_string_from_loop')
   merged_df = pd.concat([merged_df, temp_df], axis=1)

print(merged_df)

Alternatively, you could try to create a multi-index header for each df in the loop rather than appending the column names. But I would still create the temporary df's and do the concat to merge them.

Best way to take the values of a column that are all related to multiple rows with the same index

2 Answers2