I have a folder named x_list
with subfolders named [y1,y2 ... y10]
. In these subfolders are textfiles located. I need to read these textfiles into Python with there corresponding subfolder-name which is coming from x-list
.
I have the following code, that is working. The only issue is that the textfiles are losing the punctuation. I believe the error is in the append
function.
df = pd.DataFrame()
x_list = os.listdir(x_path) #list with classes
for i in range(0,len(x_list)):
x_path2 = x_path+"/"+ x_list[i]
files = os.listdir(x_path2)
#Read all the documents from the subfolder and fill the dataframe
for j in range(0,len(files)):
p = x_path2+"/" + files[j]
f = open(p,"r")
df = df.append({'text':f.read(), 'class':x_list[i]}, ignore_index =True)
f.close()
The text contains dates but in the output the date are presented like 01012017
instead of 01-01-2017
. Also dots, comma's and currencies are lost.
How do I solve this issue, so I don't lose the punctuation.
The output should looks like:
text class
Welcome blabla 20-09-2017 y1
Goodbye blabla 23-09-2017 y1
lorum es ti date 09-09-2017 y2