0

The first part of this question has been asked many times and the best answer I found was here: Import multiple csv files into pandas and concatenate into one DataFrame.

But what I essentially want to do is be able to add another variable to each dataframe that has participant number, such that when the files are all concatenated, I will be able to have participant identifiers.

The files are named like this: enter image description here

So perhaps I could just add a column with the ucsd1, etc. to identify each participant?

Here's code that I've gotten to work for Excel files:

path = r"/Users/jamesades/desktop/Watch_data_1/Re__Personalized_MH_data_call"
all_files = glob.glob(path + "/*.xlsx")

li = []

for filename in all_files:
    df = pd.read_excel(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)
James
  • 459
  • 2
  • 14

1 Answers1

1

If I understand you correctly, it's simple:

import re # <-------------- Add this line

path = r"/Users/jamesades/desktop/Watch_data_1/Re__Personalized_MH_data_call"
all_files = glob.glob(path + "/*.xlsx")

li = []

for filename in all_files:
    df = pd.read_excel(filename, index_col=None, header=0)
    participant_number = int(re.search(r'(\d+)', filename).group(1)) # <-------------- Add this line
    df['participant_number'] = participant_number  # <-------------- Add this line
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

That way, each dataframe loaded from an Excel file will have a column called participant_number, and the value of that column each row in each dataframe will be the number found in the filename that the dataframe was loaded from.

  • it looks like all participants are just being named "1" – James Nov 17 '21 at 20:40
  • Hmm. Well, I can't tell. can you `print(participant_number)` before it's assigned to the df? –  Nov 17 '21 at 21:56
  • 1
    Yeah, I figured out what's going on, I think. So it would be taking the first digit of the file name which will be the 1 from "Watch_data_1." If use the relative file path, everything seems to work well. Thanks! – James Nov 17 '21 at 22:41
  • Aha! Funny. Okay, I'm glad you got it working. –  Nov 17 '21 at 23:35
  • 1
    Thanks for your help! – James Nov 18 '21 at 23:45