0

I am trying to take multiple data files with the same number of columns into one continuous set of data. The x values are time and each subsequent datafile starts at the time of the previous one finishing. So in theory if I do some code like this:

    data = pd.read_csv(r"/PATH/out.txt", sep="\t")
    data2 = pd.read_csv(r"/PATH/out2.txt", sep="\t")
    data3 = pd.read_csv(r"/PATH/out3.txt", sep="\t")
    data4 = pd.read_csv(r"/PATH/out4.txt", sep="\t")
    data5 = pd.read_csv(r"/PATH/out5.txt", sep="\t")
    data6 = pd.read_csv(r"/PATH/out6.txt", sep="\t")
    data7 = pd.read_csv(r"/PATH/out7.txt", sep="\t")
    data8 = pd.read_csv(r"/PATH/out8.txt", sep="\t")

    print(data)

    data.append(data2, ignore_index=True)
    data.append(data3, ignore_index=True)
    data.append(data4, ignore_index=True)
    data.append(data5, ignore_index=True)
    data.append(data6, ignore_index=True)
    data.append(data7, ignore_index=True)
    data.append(data8, ignore_index=True)

    print(data)

    arr = data.to_numpy()

The print statement before and after should be different right? But it doesnt appear to append the other data files to the first one when I try. I must be missing something obvious, can anyone help with this?

The data files are in a 2 column format and look something like this (leftmost column is just the pandas indexing):

            Time(s)  CMASS(1,1,53)
    0     97.000229       0.999999
    1     98.000183       0.999999
    2     98.001122       0.999999
    3     98.200874       0.999999
    4     98.400703       0.999999
    ..          ...            ...
    209  119.700410       0.999999
    210  119.800410       0.999999
    211  119.900410       0.999999
    212  120.000400       0.999999
    213  120.000400       0.999999
zwolfgang
  • 43
  • 5

2 Answers2

2

DataFrame.append returns a new object, it doesn't modify the dataframe that calls it. You would have to do:

data = data.append(data2)

Or you could just do:

data = pd.concat(pd.read_csv(path) for path in list_of_paths_to_csv)
Oyono
  • 377
  • 1
  • 8
1

The pandas.DataFrame.append method does not work not inplace by rather returns a new DataFrame. Thus, you need to save the return to your variable:

data = data.append(data2, ignore_index=True)

Yet, it would be more efficient in your case to use the pandas.concat method:

import glob
data = pd.concat([pd.read_csv(f, sep="\t") for f in glob.glob('/PATH/out*.txt')])
mozway
  • 194,879
  • 13
  • 39
  • 75