0

I have written a python code to read from a log file and save some of the infos from that log file to a text file

This is the log.txt file

2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: channels=1, size=82434
2022-10-12 18:15:22.992 0026/? I/AsrDecActor25: waiting asr-core ready: 12 secs
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state: START, true
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: asr state 2: START
2022-10-12 18:15:23.058 0199/? I/AsrDecActor27: end of decoding 57 true 0
NEC Input :secure folder app close it
NEC Replacement suggestion :Secure folder
NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word  : Secure folder
NEC Output After Replace : Secure folder close it
Changes : 1
2022-10-12 18:15:23.060 0199/? I/LangPackActor: eASR [NEC] Run completed, Time: 2 ms
PostProcessSubstitutions::Output of question mark processing: secure folder uninstall Kare
[eITN] Input:Secure folder uninstall kare OutputSecure folder uninstall Kare
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR [Timestamp] getTimestamp starts
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR string2IntegerList 14 20 23 32 36 
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR levenshteinMapping
2022-10-12 18:15:23.069 0199/? I/LangPackActor: eASR new ASRResult
2022-10-12 18:15:23.091 0021/? I/AsrDecActor26: decoding 

Now I have written a code to extract the lines with beginning "NEC Input Before Replace", "NEC Matching Word", "Replaced Word" and "NEC Output After Replace" such that my output.txt file looks like this

NEC Input Before Replace : secure folder app close it
NEC Matching Word : secure folder app
Replaced Word  : Secure folder
NEC Output After Replace : Secure folder close it

The code I have written for the same is

#!/usr/bin/env python
f = open('log.txt','r',encoding='utf-8')
f1 = open('output.txt', 'a')

doIHaveToCopyTheLine=False

for line in f.readlines():
    if 'NEC Input Before Replace' or 'NEC Matching Word' or 'Replaced Word' or 'NEC Output After Replace' in line:
        f1.write(line)

    
f1.close()
f.close()

But this code is throwing me this error

---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
Input In [14], in <cell line: 7>()
      7 for line in f.readlines():
      8     if 'NEC Input Before Replace' in line or 'NEC Matching Word'  or 'NEC Output After Replace' in line:
----> 9         f1.write(line)
     14 f1.close()
     15 f.close()

File D:\Anaconda\lib\encodings\cp1252.py:19, in IncrementalEncoder.encode(self, input, final)
     18 def encode(self, input, final=False):
---> 19     return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\ud64d' in position 20: character maps to <undefined>

But when I remove 'Replaced Word' and 'NEC Output After Replace' from the condition statement if 'NEC Input Before Replace' in line or 'NEC Matching Word' or 'Replaced Word' or 'NEC Output After Replace' in line:, then the code works fine.

Anyone knows what is the issue and how it fix this?

Turing101
  • 347
  • 3
  • 15
  • When you don't specify an encoding for the file you write to, Python chooses your system default encoding. On Windows, this usually sucks. `cp1252` cannot represent most Unicode characters. – tripleee Oct 13 '22 at 08:43
  • If I add encoding this is just printing the entire log.txt file in my output file – Turing101 Oct 13 '22 at 08:52
  • I have changed the code like this `f1 = open('output.txt', "w", encoding='utf-8')`, which just prints the entire log.txt file in my output – Turing101 Oct 13 '22 at 08:52
  • That's a separate FAQ. I'll add a second duplicate. – tripleee Oct 13 '22 at 08:58

0 Answers0