0

I am working on a script that reads a text file into a pandas DataFrame that can contain a variety of columns and rows. Then, some operations are made on the data, and it needs to sum it all up into a single DataFrame for output to an excel document.

My code works for a single file but now I need to iterate over all of the files.

This seems like it should be very easy to do but I've tried all of the pandas functions I can find to accomplish this but nothing works.

Here is the basic structure:

import glob
import pandas as pd
# ...
inputFiles = glob.glob('*.rep')

for filename in inputFiles:
    df = pd.read_csv(filename, sep = ' ')
    # DF MODIFICATIONS...
    # Need to send a new df here to avoid overwriting on loop

Example of inputs/desired output:

#file1.rep:
columnA columnB columnC
val1 val2 val3
#file2.rep:
columnA columnB columnX
val4 val5 val6

#resulting dataframe:
columnA columnB columnC columnX
val1    val2    val3    NaN
val4    val5    NaN     val6

I tried append, add, combine, join, concat, and none of them have worked. Am I just using one of these improperly?

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
ebob
  • 1
  • 1
  • 1
    This is fairly well covered by [Import multiple csv files into pandas and concatenate into one DataFrame](https://stackoverflow.com/q/20906474/15497888). You've not shown the actual concat/join part of your code so it's hard to know where you went wrong. – Henry Ecker Dec 16 '21 at 00:38
  • append all to a list and use `pd.concat`? –  Dec 16 '21 at 00:40
  • @HenryEcker thank you! did not find this in my searching – ebob Dec 16 '21 at 00:54

2 Answers2

1

Try appending all the dataframes to a list and then using pd.concat (with axis=0, which is the default) to combine them:

import glob
import pandas as pd
# ...
inputFiles = glob.glob('*.rep')

dfs = []
for filename in inputFiles:
    df = pd.read_csv(filename, sep = ' ')
    # DF MODIFICATIONS...
    dfs.append(df)

full_df = pd.concat(dfs)
0

Consider generalizing your process in a defined method. Then run pandas.concat on output of a list comprehension:

def process_df(filename):
    df = pd.read_csv(filename, sep = ' ') 

    # DF MODIFICATIONS...

    return df

final_df = pd.concat(
    [process_df(f) for f in inputFiles]
)
Parfait
  • 104,375
  • 17
  • 94
  • 125