3

I have about 10 CSV files that I'd like to append into one file. My thought was to assign the file names to numbered data_files, and then append them in a while loop, but I'm having trouble updating the file to the next numbered date_file in my loop. I keep getting errors related to "data_file does not exist" and "cannot concatenate 'str' and 'int' objects". I'm not even sure if this is a realistic approach to my problem. Any help would be appreciated.

import pandas as pd

path = '//pathname'
data_file1= path + 'filename1.csv'
data_file2= path + 'filename2.csv'
data_file3= path + 'filename3.csv'
data_file4= path + 'filename4.csv'
data_file5= path + 'filename5.csv'
data_file6= path + 'filename6.csv'
data_file7= path + 'filename7.csv'

df = pd.read_csv(data_file1)

x = 2
while x < 8:
     data_file = 'data file' + str(x)
     tmdDF = pd.read_csv(data_file)
     df = df.append(tmpDF)
     x += x + 1
Grace Rich
  • 33
  • 1
  • 5
  • Why do you want to use pandas for this? If you're just concatenating the files [there are more efficient ways](http://stackoverflow.com/questions/13613336/python-concatenate-text-files). – Paulo Almeida Aug 28 '15 at 20:02
  • By the way, you want either `x += 1` or `x = x + 1`. Your last line increments x by 2. And you are trying to open files named 'data file1' , 'data file2', etc, which is why it is failing. You can't assign variable names like that, you might want to use a dictionary. Of course, you also shouldn't repeat the same line seven times almost verbatim, you could use a loop for that. – Paulo Almeida Aug 28 '15 at 20:07
  • @PauloAlmeida The files all have the same header row, and I'm attempting to capture only the information below the header. A colleague suggested using pandas dataframes for this. Knowing that, does it make sense to you to use pandas for this? I'm very open to using something other than pandas, it was just an initial suggestion by someone else. I know very little of python. – Grace Rich Aug 31 '15 at 13:55
  • If you want to remove the header, it's a little more complicated than the methods in my link, but pandas is still unnecessary, since you don't process the data. I posted an answer that uses the fileinput module. – Paulo Almeida Aug 31 '15 at 14:30

2 Answers2

6

Not quite sure what you're doing in terms of constructing that string data_file within the loop. You can't address variables using a string of their name. Also as noted by Paulo, you're not incrementing the indices correctly either. Try the following code but note that for the purposes of merely concatenating csv files, you certainly do not need pandas.

import pandas
filenames = ["filename1.csv", "filename2.csv", ...] # Fill in remaining files.
df = pandas.DataFrame()
for filename in filenames:
    df = df.append(pandas.read_csv(filename))
# df is now a dataframe of all the csv's in filenames appended together
Isaac Drachman
  • 984
  • 6
  • 8
1

You can use fileinput for this:

import fileinput

path = '//pathname'
files = [path + 'filename' + str(i) + '.csv' for i in range(1,8)]

with open('output.csv', 'w') as output, fileinput.input(files) as fh:
    for line in fh:
        if fileinput.isfirstline() and fileinput.lineno() != 1:
            continue
        output.write(line)  
Paulo Almeida
  • 7,803
  • 28
  • 36