-1

I have a big data txt file containing 10^6 one-column data in python.

I need to Read input data file and spilt it to 100 input files. and making directories and send each data sets to corresponding directory. (Each file in a folder)

I know how to split the data file into 100 files and i know how to make folders.

BUT my question is how to address the created new folder for every data set to be written in that directory. In better words i have a loop, splitting big data file,in each loop creating the directory at the same time,how should i address the created file for my created data set.I put my code here,suggest me if there's any better way to write it.

import os

def createfolder(directory):
    try : 
        if not os.path.exists(directory):
            os.makedirs(directory)
    except OSError:
        print('Error: creating directory.' + directory)
    return

def splitfiles():
    input = open('data.txt','r').read().split('\n')
    i=1
    splitlength = int(len(input)/100)
    for lines in range(0,len(input),splitlength):
        print(i)
        outputdata= input[lines:lines+splitlength]
        createfolder('./Splitted files/')
        output = open('data'+str(i)+ '.txt', 'w')
        output.write('\n'.join(outputdata))
        output.close()
        i+=1

    print("Completed!")

    return
if __name__ == "__main__":
    splitfiles()
melpomene
  • 84,125
  • 8
  • 85
  • 148
Bahar
  • 3
  • 3
  • Instead of using index variables like `i`, learn how to use [enumerate](https://stackoverflow.com/questions/22171558/what-does-enumerate-mean) – Georgy Feb 24 '18 at 17:44

1 Answers1

0

If you want 100 folders and each folder containing the file, why not just create folder with i in the name as you did with file?

import os

def createfolder(directory):
    try : 
        if not os.path.exists(directory):
            os.makedirs(directory)
    except OSError:
        print('Error: creating directory.' + directory)
    return

def splitfiles():
    input = open('data.txt','r').read().split('\n')
    i=1
    splitlength = int(len(input)/100)
    for lines in range(0,len(input),splitlength):
        print(i)
        outputdata= input[lines:lines+splitlength]
        createfolder(os.path.join('./Splitted files',str(i)))
        output = open(os.path.join('./Splitted files',str(i),'data'+str(i)+ '.txt'), 'w')
        output.write('\n'.join(outputdata))
        output.close()
        i+=1

    print("Completed!")

    return
if __name__ == "__main__":
    splitfiles()
Dinesh
  • 1,555
  • 1
  • 16
  • 18
  • Thanks that really helped. And i have another question concerning pathes. I want to calculate mean and standard deviation for each data set and print them in one file in the directory. I defined a function for average but i can't address my file(for example data1.txt) to the average function. def average(textfile): with open(textfile) as fh: sum = 0 count = 0 for line in fh: count += 1 # increment the counter sum += float(line.split()[1]) average = sum / count print (average) – Bahar Feb 24 '18 at 16:45