I am working on a script that reads a text file into a pandas DataFrame that can contain a variety of columns and rows. Then, some operations are made on the data, and it needs to sum it all up into a single DataFrame for output to an excel document.
My code works for a single file but now I need to iterate over all of the files.
This seems like it should be very easy to do but I've tried all of the pandas functions I can find to accomplish this but nothing works.
Here is the basic structure:
import glob
import pandas as pd
# ...
inputFiles = glob.glob('*.rep')
for filename in inputFiles:
df = pd.read_csv(filename, sep = ' ')
# DF MODIFICATIONS...
# Need to send a new df here to avoid overwriting on loop
Example of inputs/desired output:
#file1.rep:
columnA columnB columnC
val1 val2 val3
#file2.rep:
columnA columnB columnX
val4 val5 val6
#resulting dataframe:
columnA columnB columnC columnX
val1 val2 val3 NaN
val4 val5 NaN val6
I tried append, add, combine, join, concat, and none of them have worked. Am I just using one of these improperly?