1

This is my current code.

my_file = open("/content/txts/txt1.txt", "r")
data = my_file.read()
l1 = clean_data(data)

my_file = open("/content/txts/txt2.txt", "r") 
data = my_file.read() 
l2 = clean_data(data)

my_file = open("/content/txts/txt3.txt", "r") 
data = my_file.read() 
l3 = clean_data(data)

my_file = open("/content/txts/txt4.txt", "r") 
data = my_file.read() 
l4 = clean_data(data) 

But I dont want to apply the same functions over and over again. To create seperate lists for each of my txt file, I have tried an alternative:

import os
pathToFolder = '/content/txts'
fileList = os.listdir(pathToFolder)
dataDict = {}
for i in range(len(fileList)-1):
   with open(fileList[i],"r") as f:
    data = f.read()
    dataDict['l' + str(i)] = clean_data(data)
    f.close()

But I am getting this error

This is my txts folderFolder

1 Answers1

0

Try to use os.listdir() to list the files in the folder than loop through the list. You would need to save which file you left off on so you could start in the same place. The following code would loop from whatever index you left off at in the folder to the end and save all the clean dataset into a dictionary with the keys being names similar to what you used in your question.

import os
fileList = os.listdir(pathToFolder)
dataDict = {}
for i in range(WhereYouLeftOffAt,len(fileList)-1):
   open(pathToFolder + '/' + fileList[i],"r") as f
   data = f.read()
   dataDict['l' + str(i)] = clean_data(data)
   f.close()
Mitchell Leefers
  • 170
  • 1
  • 13
  • 1
    Or keep a set of files processed and pickle it so that each time the program is run it can read the pickle and exclude those files from further processing. – wwii Feb 08 '23 at 19:52