0

I have a program which generates text files in every some seconds and within few days, the file number reaches to thousands of hundreds. I dont want to merge them into a single files because then file size reaches to over 50 Gb. But What I want is to divide and merge those smaller files into some number of files. For example, I have 10 files and I would like to merge first 5 files to FileA.txt, then 3 files to FileB.txt and rest two files to FileC.txt.

Is this possible to achieve in Python without copying and pasting files to folder and then merge?

Jongware
  • 22,200
  • 8
  • 54
  • 100
user96564
  • 1,578
  • 5
  • 24
  • 42
  • 2
    It's possible. There's no reason to first copy them to another folder, just read them from where ever they reside. – martineau Dec 21 '18 at 08:31
  • 2
    It is possible. But the conditions have to be defined clearly. What criteria to divide by and what criteria to merge by? As a starting point, you can use `os.path.getsize(filepath)` to determine the size of your file as a basis for dividing and merging – ycx Dec 21 '18 at 08:31
  • @ycx I think the easier criteria would be total number of files that each merged file contain and for the last merged file can contain whatever number of files left to merge. – user96564 Dec 21 '18 at 08:42
  • @martineau Do you happen to know any example or library that does this ? – user96564 Dec 21 '18 at 08:44
  • 1
    You don't need any special library for that. simple google search will give you a lot of examples, like this one https://stackoverflow.com/questions/13613336/python-concatenate-text-files but you still need to decide on what all files should be merged. – PunyCode Dec 21 '18 at 09:28
  • user3280146: There's no library for this specific sort of thing. All it takes is the standard built-ins / standard library...and some relatively straight-forward programming. Give it a try! @Sam has described the basic process in his answer. – martineau Dec 21 '18 at 13:22

1 Answers1

1

It is certainly possible simply by opening each file sequentially and appending its contents to an output file, creating a new output file whenever you deem is necessary, and deleting the input files after they have been read. You should be able to do this easily with the Python standard library, though there could be packages that make the process easier that I do not know about.

Sam Hollenbach
  • 652
  • 4
  • 19