6

I'm really new to Python, so this question might be a bit basic. I have 44 csv files with the same headers and different file names. I want to combine them all into one file.

Each file is named "Votes-[member-name]-(2010-2014)-[download-time].csv"

The headers are do not include a column for the member name. I would like to add that as the first item.

This does part of what I want to do: how to merge 200 csv files in Python. I'm just not sure how to iterate through files with different names, and add those names to the csv.

Thanks!

Community
  • 1
  • 1
cassidycodes
  • 194
  • 1
  • 7

1 Answers1

8

To iterate through the filenames you can use a similar method as answered here, using glob:

import glob
import os
os.chdir("/mydir")
for files in glob.glob("*.csv"):
    print files

Then, to add the member name to the header, you can print all the csv files line by line. If the line is a header, then print the member name on the same line as the header. (This isn't real code, but you can get the point)

for files in glob.glob("*.csv"):
    for lines in files:
         if line == header:
              print member,line
         else:
              print line

To split the CSV file and only use the member name (slightly modified so to not have a hyphen)

'Votes-[member name]-(2010-2014)-[download-time].csv'.split('-')[1]

UPDATE for bash solution: You can save this text and run it from the terminal (see instructions here for Mac)

Generate CSV files (not necessary)

cat <<"EOF" > 1.csv
1,2,3
4,5,6
EOF

cat <<"EOF" > 2.csv
a,b,c
d,e,f
EOF

Parse CSV files - this script takes all CSV files and writes their file name as the first column. It also puts them into one file (note that I tested on debian linux, not mac).

rm -f all.csv
for fyle in *.csv ; do 
    echo | awk -v f=$fyle '{ print f","$0 }' $fyle >> all.csv
done
exit 0

SECOND UPDATE: If you want to remove the duplicate headers, the simplest way from the shell is to use 'grep -v', which selects all lines that don't match. You can pick a generic string that only exists in the header.

head -1 all.csv > nohead.csv # add one header line
grep -v "header string" all.csv >> nohead.csv # add rest of rows, with no headers
Community
  • 1
  • 1
philshem
  • 24,761
  • 8
  • 61
  • 127
  • Thanks, I'll try this out. I'm on a Mac, if that makes a difference. – cassidycodes Dec 19 '13 at 14:59
  • I added a solution using the bash shell. – philshem Dec 19 '13 at 15:16
  • Yeah, that's what I'm looking for! Is there a way to only include the member name and not the whole file name? – cassidycodes Dec 19 '13 at 15:24
  • Yes, you have to parse the file name as a string, then print it. See http://stackoverflow.com/a/11401139/2327328. Also, I added 2 more lines in order to not have duplicated headers. – philshem Dec 19 '13 at 15:28
  • But... to split the string I think python is the better tool. 'Votes-[member name]-(2010-2014)-[download-time].csv'.split('-')[1] Note that I removed the '-' from member name. – philshem Dec 19 '13 at 15:30
  • What about using cut in bash? http://stackoverflow.com/questions/49403/how-do-you-parse-a-filename-in-bash? The member name is spelled with an '_', my mistake. I'll look at splitting the sting in python though. – cassidycodes Dec 19 '13 at 15:50
  • I did not know about 'cut', and it's nice to learn something new. – philshem Dec 19 '13 at 20:09