-1

I have a problem with the handling of txt files, the source file is encoded 'UTF 8-WITHOUT BOM' and I tried to put many "encoding = " but I can't solve this..

Here I attach an image

Right is the origen of file and left is the result

This is the code.

import io
import time

result = io.open("Edificado/edificadoResultadoSinPorBlancos.txt","w")
start = time.time()
print(f"Empece en: {start}")

with io.open("Edificado/edificco.txt","r",errors="ignore") as f:
    for line in f:
        if '|' in line:
            line = line.replace("|","-")
        result.write(line)
result.close()

end = time.time()

print(f"Termine en: {end - start}")
        

(the file weighs 6gb)

Any idea how I could fix it?

this is the coding from the file

imnachox2
  • 5
  • 3

1 Answers1

0

you could just use "open" , io.open is an alias for "open"

import time

result = open("Edificado/edificadoResultadoSinPorBlancos.txt","w",encoding='utf-8',errors="surrogateescape")
start = time.time()
print(f"Empece en: {start}")

with open("Edificado/edificco.txt","r",errors="ignore",encoding='utf-8',errors="surrogateescape") as f:
    for line in f:
        if '|' in line:
            line = line.replace("|","-")
        result.write(line)
result.close()

end = time.time()

print(f"Termine en: {end - start}")

updated Alternatively based on mark ransoms comment

just work in binary instead

import time

result = open("Edificado/edificadoResultadoSinPorBlancos.txt","wb")
start = time.time()
print(f"Empece en: {start}")

with open("Edificado/edificco.txt","rb") as f:
    for line in f:
        if b'|' in line:
            line = line.replace(b"|",b"-")
        result.write(line)
result.close()

end = time.time()

print(f"Termine en: {end - start}")
geekay
  • 340
  • 1
  • 5
  • You won't be able to use `encode` to write the file if you didn't open it in binary mode - unless you're still using Python 2. – Mark Ransom Jan 31 '23 at 19:25
  • you are right infact he could just work in binary – geekay Jan 31 '23 at 19:51
  • The whole point of adding the `encoding` option to `open` is so that you wouldn't have to worry about encoding everything by hand all the time, and it would be consistent. – Mark Ransom Jan 31 '23 at 19:52
  • in my solution... errors="surrogateescape" usually solves any problems.... my suggestion was in case he still faces some random problem. but his problem statement is easily solved in binary mode.. he doesnt have to bother with encoding at all.. he needs to just replace "|" with "-" line by line... and nothing else. – geekay Jan 31 '23 at 20:02
  • Thanks!, errors = "surrogateescape" is the solution!, thanks. – imnachox2 Feb 01 '23 at 12:46