I have imported a few thousand txt files from a folder into pandas dataframe
. Is there any way I can create a column adding a sub-string from the filenames of the imported txt files in it? This is to identify each text file in the dataframe by a unique name.
Text files are named as 1001example.txt, 1002example.txt, 1003example.txt
and son on. I want something like this:
filename text
1001 this is an example text
1002 this is another example text
1003 this is the last example text
....
The code I have used to import the data is below. However, I do not know how to create a column by a sub-string of filenames. Any help would be appreciated. Thanks.
import glob
import os
import pandas as pd
file_list = glob.glob(os.path.join(os.getcwd(), "K:\\text_all", "*.txt"))
corpus = []
for file_path in file_list:
with open(file_path, encoding="latin-1") as f_input:
corpus.append(f_input.read())
df = pd.DataFrame({'text':corpus})