-1

I have multiple plain text files that need to be saved in each row in a data frame. I want to make the data frame consist of two columns: the filenames and texts. The code below does not spit error message, but it creates a data frame that takes the file contents as column names, all put in the first row.

working code (revised following the suggestions @ Code different :

 from pathlib import Path

df = []
for file in Path("/content/").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)
df
print(df)
  

the output:

Empty DataFrame
Columns: [                The Forgotten Tropical Ecosystem 
Index: []

[0 rows x 9712 columns]

How could the code be improved so that the texts are put in each row under the column heading 'text'?

Sangeun
  • 45
  • 6
  • What is inside your files ? – Devyl Sep 12 '22 at 14:53
  • You can read them each into their own dataframe in a loop and then use `concat()`: [Import multiple CSV files into pandas and concatenate into one DataFrame](https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe) – JNevill Sep 12 '22 at 14:55
  • @ Devly the lline shown in the output: The Forgotten Tropical Ecosystem – Sangeun Sep 12 '22 at 16:05
  • @ JNevill Yes I tried tha slolution in 'Import multiiple CSV files...', but the solution returns the same problem: all the texts are in the column, within only one row. – Sangeun Sep 12 '22 at 16:16

2 Answers2

1

I have done this a lot at work and here's how I typically do it:

from pathlib import Path

df = []
for file in Path("/content").glob("*.txt"):
    df.append(
        # Read each file into a new data frame
        pd.read_table(file)
        # Add a new column to store the file's name
        .assign(FileName=file.name)
    )

# Combine content from all files
df = pd.concat(df, ignore_index=True)
Code Different
  • 90,614
  • 16
  • 144
  • 163
1

Here is one possible answer to my question, which uses the dictionary function. My friend helped me with this and it works. Not really sure why the suggested answer would not work in my environment. But thanks anyway!

Code:

import os

# table format [file_name: text]
dictionary = {}
file_names = []
file_texts = []
for file_name in os.listdir('.'):
  if '.txt' in file_name:
    # Load the text file
    f = open(file_name, "r")
    # Read the text in the file
    text = f.read()

    file_names.append(file_name)
    file_texts.append(text)

dictionary["file_names"] = file_names
dictionary["file_texts"] = file_texts

import pandas as pd
pandas_dataframe = pd.DataFrame.from_dict(dictionary)

print(pandas_dataframe)
Sangeun
  • 45
  • 6