1

I am trying to load my json file with the stdin using the Windows command line: python algo.py < number.json and using json.loads(sys.stdin) in my script but it fails.

However, I can load my json with

with open('number.json',encoding='utf-8-sig') as f:
n = json.loads(f)

Exception raised when using json.loads(sys.stdin):

the JSON object must be str, bytes or bytearray, not TextIOWrapper

Exception raised when using json.load(sys.stdin) or json.loads(sys.stdin.read()):

Expecting value: line 1 column 1 (char 0)

Anyone encountered the same issue? I read multiple posts in this forum prior asking help.

Here is the json file:

[
  {
    "x": 1,
    "y": 4,
    "z": -1,
    "t": 2
  },
  {
    "x": 2,
    "y": -1,
    "z": 3,
    "t": 0
  }
]
eliaroseX
  • 47
  • 8
  • `json.load(sys.stdin)` (without `s`) – Klaus D. Jul 01 '19 at 09:45
  • both `load` and `loads` fails with the `sys.stdin` method. – eliaroseX Jul 01 '19 at 09:48
  • What is the error message exactly? `json.load(sys.stdin)` works for me with a proper json file. – The Pjot Jul 01 '19 at 09:53
  • The exception raises that it should be a string, bytes but not a "TextIOWrapper" @KlausD. – eliaroseX Jul 01 '19 at 09:53
  • json.load(open('number.json')) will definitely work! – Keerthana Prabhakaran Jul 01 '19 at 09:57
  • @KeerthanaPrabhakaran I just tried `json.load(open(sys.stdin))` because I need to use sys.stdin and got `expected str, bytes or os.PathLike object, not _io.TextIOWrapper` with your suggestion... may you know why? – eliaroseX Jul 01 '19 at 09:58
  • @ThePjot see above, please. – eliaroseX Jul 01 '19 at 09:59
  • trying to load the content from `sys.stdin` won't work because it's a file-like object so it's expected that you get the first exception you mentioned. The second error points to the data you are getting not being a valid JSON so could you just print what you get from calling to `sys.stdin.read()`. A `print(sys.stdin.read())` will do. – Layo Jul 01 '19 at 10:49
  • `[ { "x": 1, "y": 4, "z": -1, "t": 2 }, { "x": 2, "y": -1, "z": 3, "t": 0 } ]` Here is what I obtained with `print(sys.stdin.read())` @Layo – eliaroseX Jul 01 '19 at 10:58
  • What python 3 version are you using exactly? My test was using 3.6.8 – The Pjot Jul 01 '19 at 11:18
  • @eliaroseX it's clear that the first 3 chars (``) are causing `json.loads()` to fail. You need to investigate why are those chars on the "string", I'm sorry I can't help you further because I don't run on Windows. – Layo Jul 01 '19 at 12:10
  • The three chars are the [UTF-8 BOM](https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8), when interpreted as `latin1` (also known as [ISO/IEC 8859-1](https://en.wikipedia.org/wiki/ISO/IEC_8859-1)). – rtoijala Jul 01 '19 at 12:15
  • @rtoijala what you’re saying is interesting because there is an optional parameter encoder on the function json.load but it doesn’t help on that case unfortunately – eliaroseX Jul 01 '19 at 12:26

1 Answers1

1

Based on your comments, your problem seems to be that you have the UTF-8 BOM prepended to your file. That means that the extra three bytes 0xEF 0xBB 0xBF are found first in your file.

The Python json module documentation says that it does not accept a BOM. Therefore you must remove it before passing the JSON data to json.load or json.loads.

There are at least three ways to remove the BOM. The best is to simply edit your JSON file to remove it. If that is not possible, you can skip it in your Python code.

If only need your code to work with files that contain a BOM, you can use:

assert b'\xEF\xBB\xBF' == sys.stdin.buffer.read(3)

This makes sure that the removed bytes were really the UTF-8 BOM.

If you need to work with files that may or may not contain a BOM, you can wrap your standard input stream with a TextIOWrapper with the correct encoding, as mentioned in this answer. Then the code looks like this:

import io
stdin_wrapper = io.TextIOWrapper(sys.stdin.buffer, encoding='utf-8-sig')
# use stdin_wrapper instead of stdin

Quoting the Python Unicode HOWTO for why utf-8-sig:

In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte-order dependent. The mark simply announces that the file is encoded in UTF-8. For reading such files, use the ‘utf-8-sig’ codec to automatically skip the mark if present.

rtoijala
  • 1,200
  • 10
  • 20