0

I want to get specific information from a log file and filter this through some strings. I chose to use codecs.open as I was getting error messages like:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb4 in position 3167: invalid start byte.

The problem was not that the encoding was inappropriate like utf-16.

Doing so made the error disappear but now this script is taking way longer than before. Is there any way to optimise this to reduce the runtime?

My code looks a lot like this:

listeFull = codecs.open("file", "r",encoding='utf-8', errors='ignore')
strings = ("str1","str2","str3")
net = "0.0.0.0"
for line in listeFull:
        if net in line:
            if all(s not in line for s in strings):
                print(line)
listeFull.close()
  • You could also do `with open("file", "rb") as file_handle`, that won't give you encoding issues and be a bit faster. And then do `for line in file_handle` - it's probably one of the fastest ways to open and traverse a file. – Torxed Jan 31 '19 at 11:59
  • Thank you. I did this and it is working quite well. But now the output looks like this `b'line from file\n'` Is there a way to prevent this and just output `line from file` – majesticLSD Jan 31 '19 at 12:07
  • Possible duplicate of [error UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte](https://stackoverflow.com/questions/42339876/error-unicodedecodeerror-utf-8-codec-cant-decode-byte-0xff-in-position-0-in) – Torxed Jan 31 '19 at 12:07
  • Now the output looks like this `b'line from file\n'` Is there a way to prevent this and just output `line from file`? @Torxed – majesticLSD Jan 31 '19 at 12:38
  • What's wrong with the output? it looks fine to me. the `b` just tells you that it's a bytes string. The string is still a "string". and the `\n` just tells you that you have a additional line ending at the end of every row, you can do `line.strip()` and that should remove it if you don't want it. – Torxed Jan 31 '19 at 12:40
  • Thanks. The output is to be given to "non-IT people" so they would be confused. What I did was to decode the line to utf-8 before printing it. – majesticLSD Jan 31 '19 at 19:16
  • Be careful. That's the reason you're here in the first place. Some items can't be decoded in UTF-8. So make sure you error-handle that. – Torxed Jan 31 '19 at 19:20

0 Answers0