0

In python 2 a program output is redirected to a file as bytes

-file content---

b'data1'
b'data2'

when reading the file as string I am getting 'b\'data1'\r\n'...

when reading as binary I am getting b'b\'data1\r\n'...

I'd like to have all lines in a list as 'data1'...'data2' with no leading b.

I tried the decoding 'utf-8' but the line is a string with 'b\'' cannot be converted

in pyhon2 I read it like

if not sys.stdin.isatty():
    input_file = BytesIO(sys.stdin.read())

but in python3 I tried decoding before print and redirect with pipe works sometimes but sometimes I am having this error

File "Python38-32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2011' in position 373877: character maps to <undefined>

Thank.

furas
  • 134,197
  • 12
  • 106
  • 148
Bayo Alen
  • 341
  • 1
  • 6
  • 13
  • Hi! jonrsharpe, this is not the same issue. b'data' is bytes but 'b\'data'' is a string the first can be decoded not the second. – Bayo Alen Dec 22 '19 at 20:23
  • you should decode it in first script when you create output which is redirected to file - not when you read it from file (it is too late for this). When you read then you should rather `replace('b\'', "")` it with empty string. And remove `'` at the end of line. – furas Dec 23 '19 at 00:24
  • error shows that it try to encode it to `cp1252` but `'\u2011'` is unicode which may not exists in `cp1252` so you may have to use `.encode('utf-8')` instead of `.encode()` which probably uses standard Windows encoding `cp1252` instead of `utf-8` – furas Dec 23 '19 at 00:31
  • ur right. the thing is that I if I save it to a file after decode. it's OK. but if I redirect a print to a file with pipe > then there is this error: File "C:\Python\Python38-32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u2011' in position 373877: character maps to – Bayo Alen Dec 25 '19 at 19:14
  • in windows is problem because it uses `utf-8` to save in file, but it uses `cp1250` to display in terminal (and to create filenames). Linux doesn't has this problem because it uses `utf-8` in all situations. In Google or Stackoverflow you could find how to set `utf-8` in console/termina/cmd.exe - it was something with registers and code `60001` – furas Dec 26 '19 at 08:48
  • [Change default code page of Windows console to UTF-8](https://superuser.com/questions/269818/change-default-code-page-of-windows-console-to-utf-8) – furas Dec 26 '19 at 08:48

0 Answers0