1

I'm currently try to take accelerometer data samples and combine them into a single dataframe for use with machine learning. There are 100 entries per sample saved to csv's and each gets loaded in through a for loop and the merged through a for loop. After that my results look something like this:

ID X Y Z Class
ID 1 4 2 6 Action 1
ID 1 4 2 5 Action 1
ID 2 3 4 4 Action 2
ID 2 2 2 1 Action 2
ID 3 1 4 5 Action 3
ID 3 2 3 2 Action 3
ID 4 2 1 5 Action 1
ID 4 2 2 5 Action 1

I'm wanting to get a wide format table to make the data easier to use with machine learning. I'm trying to get it to look more along the lines of this second table:

ID X1 Y1 Z1 X2 Y2 Z2 Class
ID 1 4 2 6 4 2 5 Action 1
ID 2 3 4 4 2 2 1 Action 2
ID 3 1 4 5 2 3 2 Action 3
ID 4 2 1 5 2 2 5 Action 1

I've looked into melts, stacking and unstacking, groupby options, and for loops where I try to move everything over. Nothing has gotten me close to what I've been trying to accomplish. Just looking for advice on how to proceed or if there is a better method.

SNygard
  • 916
  • 1
  • 9
  • 21
Grimsbear
  • 13
  • 3

2 Answers2

1

Try:

df['tmp'] = df.groupby('ID').cumcount() + 1

df = df.pivot(index=['ID', 'Class'], columns=['tmp'])
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.reset_index()

print(df)

Prints:

     ID     Class  X1  X2  Y1  Y2  Z1  Z2
0  ID 1  Action 1   4   4   2   2   6   5
1  ID 2  Action 2   3   2   4   2   4   1
2  ID 3  Action 3   1   2   4   3   5   2
3  ID 4  Action 1   2   2   1   2   5   5
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

If df1 and df2 have the same structure and you want merge into large wide df you could give the columns unique name each time a df1, df2... dfn is generated... like this...

Lets say a temporary df is created in the loop each time but in the end you want a large one which contains all data from the others...

merged_df = pd.DataFrame()

# go into loop
for file in os.listdir(directory_path):
   temp_df = pd.read_csv(directory_path + '/' + file)
   temp_df.add_suffix('_unique_string_from_loop')
   merged_df = pd.concat([merged_df, temp_df], axis=1)

print(merged_df)

Alternatively, you could try to create a multi-index header for each df in the loop rather than appending the column names. But I would still create the temporary df's and do the concat to merge them.