0

I am trying to concatenate multiple (10-100) large files (100MB-1GB) to one using Python. I know that cat is efficient and fast, but I want to do it in Python due to repeatability and put all code as python and don't use shell scripts.

I tried:

path_with_txt_files = os.getcwd()
print("Current working directory is:",os.getcwd())
tempfiles=[f for f in os.listdir(path_with_txt_files) if f.endswith('.txt')]
print(tempfiles)
f = open("Concatenated.txt", "w")
for tempfile in tempfiles:
    f.write(tempfile.read())

I expected it to be concatenated, but I obtained

Exception has occurred: AttributeError 'str' object has no attribute 'read'

I know that tempfiles is list of strings, but how to convert it to list of file handles?

Tomasz
  • 528
  • 7
  • 21

3 Answers3

2

Instead, gather your tempfiles as a generator of fileobjects:

tempfiles = (open(f) for f in os.listdir(path_with_txt_files) if f.endswith('.txt'))
with open("Concatenated.txt", "w") as f_out:
    for tempfile in tempfiles:
        f_out.write(tempfile.read())
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • With this I get: UnicodeDecodeError: 'charmap' codec can't decode byte 0x83 in position 2567: character maps to – Tomasz Jul 31 '19 at 09:04
  • @Tomasz, set a proper/actual encoding, see this https://stackoverflow.com/questions/9233027/unicodedecodeerror-charmap-codec-cant-decode-byte-x-in-position-y-character – RomanPerekhrest Jul 31 '19 at 09:10
2

you need to open the tempfile:

for tempfile in tempfiles:
  f.write(open(tempfile, "r").read())
Nik
  • 1,093
  • 7
  • 26
1

Let me try to show you the issue with your code. You are trying to call read on the names of the files, not the file object itself. Rather, you can do this:

path_with_txt_files = os.getcwd()
print("Current working directory is:",os.getcwd())
tempfiles=[f for f in os.listdir(path_with_txt_files) if f.endswith('.txt')]
print(tempfiles)
f = open("Concatenated.txt", "w")
for tempfile in tempfiles:
    t = open(tempfile,'r')
    f.write(t.read())
    t.close()
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52