0

I want to parse a log from pilight. It has multiple JSON entries, mostly datetime entries, it is not a single JSON string. So, when I use json.load(f) I get an error

json.decoder.JSONDecodeError: Extra data: line 12 column 1 (char 171)

Below is the example file. How would I parse such a file?

What I want to do is to suppress the complete JSON entries that have "protocol": "datetime" but keep the first of them after a complete JSON entry that is not datetime.

Thus, I would get a new reduced logfile that has the "messages" followed by a single "datetime" section. In the example that file would contain only the first 2 JSON entries.

{
    "message": {
        "id": 31,
        "unit": 15,
        "state": "down"
    },
    "origin": "receiver",
    "protocol": "arctech_screen_old",
    "uuid": "0000-b8-27-eb-e85eff",
    "repeats": 1
}
{
    "origin": "receiver",
    "protocol": "datetime",
    "message": {
        "longitude": 9.000000,
        "latitude": 44.633000,
        "year": 2020,
        "month": 6,
        "day": 5,
        "weekday": 6,
        "hour": 12,
        "minute": 41,
        "second": 30,
        "dst": 1
    },
    "uuid": "0000-b8-27-eb-e85eff"
}
{
    "origin": "receiver",
    "protocol": "datetime",
    "message": {
        "longitude": 9.000000,
        "latitude": 44.633000,
        "year": 2020,
        "month": 6,
        "day": 5,
        "weekday": 6,
        "hour": 12,
        "minute": 41,
        "second": 31,
        "dst": 1
    },
    "uuid": "0000-b8-27-eb-e85eff"
}
monok
  • 494
  • 5
  • 16

2 Answers2

1

See How to extract multiple JSON objects from one file? Easiest is to add [ and ] at the start and end of file respectively, and , between any single json object.

Once loaded, and you have a list of 'json' objects, you can do the following to filter them:

filtered_jsons = [single_json for single_json in all_jsons if single_json.get('protocol') != "datetime"]
b9s
  • 517
  • 4
  • 13
1

Content of your file is not a proper json. You need to separate objects with , and put everything in list like that[{...}, {...}, ... ].

Here is example code:

# assuming we have loaded your file to a str variable 

file_content = '''
{
    "message": {
        "id": 31,
        "unit": 15,
        "state": "down"
    },
    "origin": "receiver",
    "protocol": "arctech_screen_old",
    "uuid": "0000-b8-27-eb-e85eff",
    "repeats": 1
}
{
    "origin": "receiver",
    "protocol": "datetime",
    "message": {
        "longitude": 9.000000,
        "latitude": 44.633000,
        "year": 2020,
        "month": 6,
        "day": 5,
        "weekday": 6,
        "hour": 12,
        "minute": 41,
        "second": 30,
        "dst": 1
    },
    "uuid": "0000-b8-27-eb-e85eff"
}
{
    "origin": "receiver",
    "protocol": "datetime",
    "message": {
        "longitude": 9.000000,
        "latitude": 44.633000,
        "year": 2020,
        "month": 6,
        "day": 5,
        "weekday": 6,
        "hour": 12,
        "minute": 41,
        "second": 31,
        "dst": 1
    },
    "uuid": "0000-b8-27-eb-e85eff"
}
'''

Changing to a proper json would look like:

import json
import re

proper_json_string = '[\n'+re.sub(r'}\n{', r'},\n{', file_content)+'\n]'
data = json.loads(proper_json_string)
oo00oo00oo00
  • 473
  • 6
  • 16