1

I am working on merging a number of text files together into a single text document. I am able to read all the file names and create a new output document.

However, when I output the document, I am only getting the data from one file and not the rest? Overall it should be close to 1 million lines in a txt, but only getting the first 10k

import os

projpath1 = 'PATH1'
projpath2 = 'PATH2'

for root, dirs, files in os.walk(f"{projpath1}", topdown=False):
    for name in files:
        if not name.startswith('.DS_Store'):
            split = name.split("/")
            title = split[0]
            filename = (os.path.join(root, name))
            inputf = os.path.expanduser(f'{projpath1}/{title}')
            updatedf = os.path.expanduser(f'{projpath2}/ENC_merged.txt')

            with open(inputf, "r") as text_file, open(updatedf, 'w') as outfile:
                for info in text_file:
                        for lines in info:
                            outfile.write(lines)

I really am stuck and can't figure it out :/

Brian
  • 15
  • 3
  • 2
    Everytime you `open(updatedf, 'w')` is overwrites the existing file's contents. You should open it in `a` "append" mode instead. See the fine [documentation](https://docs.python.org/3/library/functions.html#open). – martineau Oct 03 '21 at 15:46
  • 1
    look into open with "a" option https://stackoverflow.com/questions/1466000/difference-between-modes-a-a-w-w-and-r-in-built-in-open-function if you are going to open the file every time like that instead of opening once outside the loop. Also consider binary mode since treating bytes as bytes can be faster than turning ascii to wide characters and back. – Abel Oct 03 '21 at 15:48
  • @martineau This was it.. so simple and something I totally overlooked! Thank you so much : ) – Brian Oct 04 '21 at 08:28

2 Answers2

1

You are suppose to open create output file first and within it you need to save all the input files, something like this should work for you.

import os

projpath1 = 'PATH1'
projpath2 = 'PATH2'
with open(updatedf, 'w') as outfile:
    for root, dirs, files in os.walk(f"{projpath1}", topdown=False):
        for name in files:
            if not name.startswith('.DS_Store'):
                split = name.split("/")
                title = split[0]
                filename = (os.path.join(root, name))
                inputf = os.path.expanduser(f'{projpath1}/{title}')
                updatedf = os.path.expanduser(f'{projpath2}/ENC_merged.txt')
                with open(inputf, "r") as text_file:
                    for info in text_file:
                        for lines in info:
                            outfile.write(lines)
Assad Ali
  • 288
  • 1
  • 12
0

What about doing it with bash

ls | xargs cat > merged_file
Sami Fakhfakh
  • 89
  • 2
  • 17