How to append multiple CSV files and add an additional column indicating file name in Python?

Question

I have over 20 CSV files in a single folder. All files have the same structure, they just represent different days.

Example:

Day01.csv

Day02.csv

Day03.csv

Day04.csv (and so on...)

The files contain just two numeric columns: x and y. I would like to append all of these csv files together into one large file and add a column for the file name (day). I have explored similar examples to generate the following code but this code adds each y to a separate column (Y1, Y2, Y3, Y4...and so on). I would like to simply have this appended file as three columns: x, y, file name. How can I modify the code to do the proper append?

I have tried the code from this example: Read multiple csv files and Add filename as new column in pandas

import pandas as pd
import os
os.chdir('C:....path to my folder')
files = os.listdir()
df = pd.concat([pd.read_csv(fp).assign(New=os.path.basename(fp)) for fp in files])

However, this code does not append all Y values under one column. (all other aspects seem to work, however). Can someone help with the code so that all Y values are under a single column?

build your solutions step by step, so a) write a program that reads one file and writes a new with your added column (for example read a line, add your new column stringwise and write) b) take the part that reads and wrap that into a function that returns the lines to write. c) open your file that you want to write to d) make a loop over the files that you want to read, and then just write your new lines to the final file. all this you do easiest with vanilla python without any extra libraries — ahed87, Jan 13 '19 at 20:51
There are open-source tools supporting this functionality, depending on your circumstances/goals they may be an option. One is [csvstack from csvkit](https://csvkit.readthedocs.io/en/1.0.3/scripts/csvstack.html). Another is [tsv-append from eBay's tsv utilities](https://github.com/eBay/tsv-utils/blob/master/docs/ToolReference.md#tsv-append-reference). (Disclaimer: I'm the author.) There may be other tools supporting this functionality as well. — JonDeg, Jan 13 '19 at 21:58
@RomanPerekhrest Exactly, `cat Day*.csv > AllDays.csv`. [If the only tool you have is `pandas`, everything will look like a `DataFrame`](https://en.wikipedia.org/wiki/Law_of_the_instrument). — Jan Christoph Terasa, Jan 13 '19 at 22:18

Erfan · Accepted Answer · 2020-06-19T07:18:58.897

7

The following should work by creating the filename column before appending the dataframe to your list.

import os
import pandas as pd

file_list = []
for file in os.listdir():
    if file.endswith('.csv'):
        df = pd.read_csv(file,sep=";")
        df['filename'] = file
        file_list.append(df)

all_days = pd.concat(file_list, ignore_index=True)
all_days.to_csv("all.txt")

edited Jun 19 '20 at 07:18

answered Jan 13 '19 at 21:40

Erfan

40,971
8
66
78

Thanks! This helps! I was able to get it to work! Cheers! :-) – Lee Walston Jan 14 '19 at 21:04

score 0 · Answer 2 · answered Jan 13 '19 at 21:14

python is great at these simple task, almost too good to be true…

fake_files = lambda n: '\n'.join(('%d\t%d'%(i, i+1) for i in range(n, n+3)))

file_name = 'fake_me%s.csv'

with open('my_new.csv', 'wt') as new:
    for number in range(3): # os.listdir()
#        with open(number) as to_add:
#            rows = to_add.readlines()
            rows_fake = fake_files(number*2).split('\n')
            adjusted_rows = [file_name%number + '\t' + row for row in rows_fake]
            new.write('\n'.join(adjusted_rows) + '\n')

with adjustments to your specific io and naming, this is all you need. you can just copy the code and run it and study how it works.

How to append multiple CSV files and add an additional column indicating file name in Python?

2 Answers2