0

I would like to ask you for help with my json parsing. I've got file where every line looks like this one:

some hexadecimal numbers|something else|int|UA info|{'computer': {'os': {'version': 'blabla', 'name': 'blabla'}, 'app': {'version': 'blabla', 'name': 'blabla'}}}

I've got code which split every line into parts:

for line in some_file:
    line2 = line.split('|')

and I wanna to take the last part of each line (which should be in json format, at least i think so) and parse it for future use (I mean that I wanna write (to another file) os=name version, app=name version). I tried something like this:

json_string = json.loads(line2[4])

but python tells me some errors:

Expecting property name: line 1 column 2 (char 1)

or

No JSON object could be decoded

I know that it's something stupid, but i don't know what to do... I would appreciate any advice.

Geofre
  • 11
  • 4

3 Answers3

0

JSON requires double quotes for strings. Which means that you cannot load it as is with json.

I would use csv to parse the pipe-delimited file and ast.literal_eval() to safely load the last column values into Python dictionaries:

import csv
from ast import literal_eval


with open("file.csv") as f:
    reader = csv.reader(f, delimiter="|")
    data = [literal_eval(line[-1]) for line in reader]

print(data)  # data contains a list of dictionaries now
Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
0

It's not an JSON, but looks like python literal.

You can use ast.literal_eval to convert to python object:

import ast
ast.literal_eval(lines2[4])

BTW, there's one missing close bracket in the question, so:

ast.literal_eval(lines2[4] + '}')
falsetru
  • 357,413
  • 63
  • 732
  • 636
0

JSON requires double quotes for any string literal.

A string is a sequence of zero or more Unicode characters, wrapped in double quotes, using backslash escapes. - http://www.json.org/

One easy and safe way to get it parsed is to use any YAML parser.
YAML can parse JSON and is less strict on the syntax.

>>> import yaml # from package pyyaml
>>> yaml.load("{'test': 'ok'}")
{'test': 'ok'}
>>> data = yaml.load("{'computer': {'os': {'version': 'blabla', 'name': 'blabla'}, 'app': {'version': 'blabla', 'name': 'blabla'}}}")
>>> data.get('computer').get('app').get('version')
'blabla'

And for the pipe delimited data, you can split them like you do, or use csv module to do it. Bonus, you can pass every chunk of data to yaml.load and it will handle the conversion:

import csv
import StringIO

some_file = StringIO.StringIO("0x1337|something else|12456789|UA info|{'computer': {'os': {'version': 'blabla', 'name': 'blabla'}, 'app': {'version': 'blabla', 'name': 'blabla'}}}")
elements = csv.reader(some_file, delimiter="|")
for element in elements[0]:
    print(yaml.load(element))

Output:

4919
something else
12456789
UA info
{'computer': {'app': {'version': 'blabla', 'name': 'blabla'}, 'os': {'version': 'blabla', 'name': 'blabla'}}}

Cyrbil
  • 6,341
  • 1
  • 24
  • 40