Is there a way to read multiple plain text files into a dataframe?

Question

I have multiple plain text files that need to be saved in each row in a data frame. I want to make the data frame consist of two columns: the filenames and texts. The code below does not spit error message, but it creates a data frame that takes the file contents as column names, all put in the first row.

working code (revised following the suggestions @ Code different :

 from pathlib import Path

df = []
for file in Path("/content/").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)
df
print(df)

the output:

Empty DataFrame
Columns: [                The Forgotten Tropical Ecosystem 
Index: []

[0 rows x 9712 columns]

How could the code be improved so that the texts are put in each row under the column heading 'text'?

You can read them each into their own dataframe in a loop and then use `concat()`: [Import multiple CSV files into pandas and concatenate into one DataFrame](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) — JNevill, Sep 12 '22 at 14:55
@ Devly the lline shown in the output: The Forgotten Tropical Ecosystem — Sangeun, Sep 12 '22 at 16:05
@ JNevill Yes I tried tha slolution in 'Import multiiple CSV files...', but the solution returns the same problem: all the texts are in the column, within only one row. — Sangeun, Sep 12 '22 at 16:16

score 1 · Answer 1 · answered Sep 12 '22 at 15:00

1

I have done this a lot at work and here's how I typically do it:

from pathlib import Path

df = []
for file in Path("/content").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)

answered Sep 12 '22 at 15:00

Code Different

90,614
16
144
163

Thanks! But it still returns an empty data frame. What is wrong? Please help! – Sangeun Sep 12 '22 at 15:31
(1) path is wrong (2) glob pattern is wrong. Place a breakpoint inside the `for` loop and see if it ever stops there. – Code Different Sep 12 '22 at 15:36
I don't know how to put a break – Sangeun Sep 12 '22 at 15:50

score 1 · Accepted Answer · answered Oct 14 '22 at 16:12

Here is one possible answer to my question, which uses the dictionary function. My friend helped me with this and it works. Not really sure why the suggested answer would not work in my environment. But thanks anyway!

Code:

import os

# table format [file_name: text]
dictionary = {}
file_names = []
file_texts = []
for file_name in os.listdir('.'):
  if '.txt' in file_name:
    # Load the text file
    f = open(file_name, "r")
    # Read the text in the file
    text = f.read()

    file_names.append(file_name)
    file_texts.append(text)

dictionary["file_names"] = file_names
dictionary["file_texts"] = file_texts

import pandas as pd
pandas_dataframe = pd.DataFrame.from_dict(dictionary)

print(pandas_dataframe)

Is there a way to read multiple plain text files into a dataframe?

2 Answers2