How to read multiple files with Python and combine into single dataframe with identifier for each file?

Question

I would like to read in multiple files from a directory, add an identifier for each file, and concatenate them into a single dataframe. Below is an example with two files but I am looking for a more efficient way to generalize the procedure to many files. I'd like to add an id column that identifies each file (so my question slightly differs from this)

Reprex:

# file1.csv
# file2.csv

import pandas as pd

df1 = pd.read_csv("file1.csv")
df["id"] = "file1"

df2 = pd.read_csv("file2.csv")
df2["id"] = "file2"


df_all = pd.concat[df, df2]

I would also like to use it for .txt files like so:

>>> %cat 'file1.txt'
A|B|C
1|0.5|good
2|0.2|bad
3|0.6|good
4|0.4|ok

>>> %cat 'file2.txt'
A|B|C
1|0.2|bad
2|0.5|ok
3|0.9|good
4|0.3|ok
5|0.7|bad

Corralien · Answer 1 · 2021-08-30T20:24:36.930

1

You can use a dict to collect data from csv files and pathlib to walk into directory:

import pathlib

root_dir = pathlib.Path('.')
data = {}

for pth in root_dir.glob('*.csv'):
    data[pth.stem] = pd.read_csv(pth)

df = pd.concat(data).rename_axis(['id', None]).reset_index(level='id')

Example:

>>> %cat 'data1.csv'
A,B
1,0.5
2,0.2
3,0.6
4,0.4

>>> %cat 'data2.csv'
A,B
1,0.2
2,0.5
3,0.9
4,0.3
5,0.7

# final output
>>> df
      id  A    B
0  data2  1  0.2
1  data2  2  0.5
2  data2  3  0.9
3  data2  4  0.3
4  data2  5  0.7
0  data1  1  0.5
1  data1  2  0.2
2  data1  3  0.6
3  data1  4  0.4

edited Aug 30 '21 at 20:24

answered Aug 30 '21 at 20:16

Corralien

109,409
8
28
52

I updated my answer if you kindly check, please. – Corralien Aug 30 '21 at 20:28
1

Works well for .csv but I also want to use it on .txt files (sep="|") and it does not seem to work. Added sep= argument but did not work. thanks. – tall_table Aug 30 '21 at 20:34
1

If you want to read multiple files with different extensions it's not a problem if the separator is the same. If not maybe you have to map a dict like `{'txt': '|', 'csv': ','}. – Corralien Aug 30 '21 at 20:39
Provide a sample that match your case. – Corralien Aug 30 '21 at 20:39

How to read multiple files with Python and combine into single dataframe with identifier for each file?

1 Answers1