Read multiple *.txt files into Pandas Dataframe with filename as column header

Question

I am trying to import a set of *.txt files. I need to import the files into successive columns of a Pandas DataFrame in Python.

Requirements and Background information:

Each file has one column of numbers
No headers are present in the files
Positive and negative integers are possible
The size of all the *.txt files is the same
The columns of the DataFrame must have the name of file (without extension) as the header
The number of files is not known ahead of time

Here is one sample *.txt file. All the others have the same format.

Here is my attempt:

import pandas as pd
import os
import glob

# Step 1: get a list of all csv files in target directory
my_dir = "C:\\Python27\Files\\"
filelist = []
filesList = []
os.chdir( my_dir )

# Step 2: Build up list of files:
for files in glob.glob("*.txt"):
    fileName, fileExtension = os.path.splitext(files)
    filelist.append(fileName) #filename without extension
    filesList.append(files) #filename with extension

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(filesList[ijk])
    df = df.append(frame)
print df

Steps 1 and 2 work. I am having problems with step 3. I get the following error message:

Traceback (most recent call last):
  File "C:\Python27\TextFile.py", line 26, in <module>
    frame = pd.read_csv(filesList[ijk])
TypeError: list indices must be integers, not str

Question: Is there a better way to load these *.txt files into a Pandas dataframe? Why does read_csv not accept strings for file names?

instead of this `frame = pd.read_csv(filesList[ijk])` use this `frame = pd.read_csv(ijk)` in your for loop — JAbr, Apr 03 '19 at 14:08

score 8 · Accepted Answer · answered Oct 17 '14 at 00:18

8

You can read them into multiple dataframes and concat them together afterwards. Suppose you have two of those files, containing the data shown.

In [6]:
filelist = ['val1.txt', 'val2.txt']
print pd.concat([pd.read_csv(item, names=[item[:-4]]) for item in filelist], axis=1)
    val1  val2
0     16    16
1     54    54
2   -314  -314
3      1     1
4     15    15
5      4     4
6    153   153
7     86    86
8      4     4
9     64    64
10   373   373
11     3     3
12   434   434
13    31    31
14    93    93
15    53    53
16   873   873
17    43    43
18    11    11
19   533   533
20    46    46

answered Oct 17 '14 at 00:18

CT Zhu

52,648
17
120
133

Sorry I forgot to mention: there are many files maybe >20. I would strongly prefer to avoid reading them in manually. Also, I do not understand this part: "names=[item[:-4]". What is the significance of -4? – edesz Oct 17 '14 at 00:39
1

You can use `os.listdir(PATH)` to get a list of all files in the `PATH`, so that part is easy. As for, `names=item[:-4]`: the files end with `'.txt'`, and you don't want `'.txt'` to be a part of your column name, right? – CT Zhu Oct 17 '14 at 01:19
Thanks. I tried this approach: Line 1 - df = pd.DataFrame() Line 2 - for item in filesList: Line 3 - df = pd.concat(pd.read_csv(item, names=[item[:-4]]), axis = 1). But it is giving an error message: "TypeError: first argument must be a list-like of pandas objects, you passed an object of type "DataFrame". Is there some reason why this approach does not work? – edesz Oct 17 '14 at 01:54
CT Zhu's code is working but I do not understand why my approach in the comment above is not working. his method used list comprehension. I just used a simple for loop. Could you please let me know why my approach will not work? – edesz Oct 24 '14 at 13:12
Thank you! Note that for my case, I wanted to stack these dataframes by concatenating rows (instead of columns), so I replaced `axis=1` with `axis=0, ignore_index=True` – mikey Jan 21 '21 at 13:33

score 3 · Answer 2 · edited Nov 15 '21 at 05:05

You're very close. ijk is the filename already, you don't need to access the list:

# Step 3: Build up DataFrame:
df = pd.DataFrame()
for ijk in filelist:
    frame = pd.read_csv(ijk)
    df = df.append(frame)
print df

In the future, please provide working code exactly as is. You import from pandas import * yet then refer to pandas as pd, implying the import import pandas as pd.

You also want to be careful with variable names. files is actually a single file path, and filelist and filesList have no discernible difference from the variable name. It also seems like a bad idea to keep personal documents in your python directory.

Sorry about the confusion with the Pandas command - yes, that should be corrected. I have updated the Original post. — edesz, Oct 20 '14 at 00:31

Read multiple *.txt files into Pandas Dataframe with filename as column header

2 Answers2

Linked