1

I am struggling to convert this data into a list to be used in Python.

The file contains large sets of data and it is in JSON format.

Here is a sample of the data:

{"_id":{"$oid":"60551"},"barcode":"511111019862","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac114be37ce2ead437550"},"$ref":"Cogs"},"name":"test brand @1612366101024","topBrand":false}
{"_id":{"$oid":"601c5460be37ce2ead43755f"},"barcode":"511111519928","brandCode":"STARBUCKS","category":"Beverages","categoryCode":"BEVERAGES","cpg":{"$id":{"$oid":"5332f5fbe4b03c9a25efd0ba"},"$ref":"Cogs"},"name":"Starbucks","topBrand":false}
{"_id":{"$oid":"601ac142be37ce2ead43755d"},"barcode":"511111819905","brandCode":"TEST BRANDCODE @1612366146176","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac142be37ce2ead437559"},"$ref":"Cogs"},"name":"test brand @1612366146176","topBrand":false}
{"_id":{"$oid":"601ac142be37ce2ead43755a"},"barcode":"511111519874","brandCode":"TEST BRANDCODE @1612366146051","category":"Baking","categoryCode":"BAKING","cpg":{"$id":{"$oid":"601ac142be37ce2ead437559"},"$ref":"Cogs"},"name":"test brand @1612366146051","topBrand":false}

Here is the code I ran:

 import json

with open("brands.json") as f:
    data = json.load(f)

print(data)

And here is the error I get:

raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 229)
malaba
  • 13
  • 3
  • 1
    What exactly is the problem? Use the `json` library to parse the file. – Jussi Nurminen Oct 01 '21 at 12:52
  • What did you try? Please, spend some time reading ["How to create a Minimal, Complete, and Verifiable example"](https://stackoverflow.com/help/mcve) and ["How do I ask a good question?"](https://stackoverflow.com/help/how-to-ask). You will get better results by following the tips in those articles. – accdias Oct 01 '21 at 12:53
  • @JussiNurminen the file doesn't have brackets. I tried with the load method but keep getting an error. – malaba Oct 01 '21 at 12:53
  • Try this: https://stackoverflow.com/questions/12451431/loading-and-parsing-a-json-file-with-multiple-json-objects – Dani Mesejo Oct 01 '21 at 12:54
  • 1
    It doesn't look like a JSON file. It looks more like a [JSONL](https://jsonlines.org/) file instead. There is an answer [here](https://stackoverflow.com/questions/50475635/loading-jsonl-file-as-json-objects/56749504). – accdias Oct 01 '21 at 12:56

2 Answers2

2

something like the below (read the file line by line, convert each line to dict and append to a list)

import json
data = []
with open('brands.json') as f:
    for line in f:
        data.append(json.loads(line.strip()))
print(data)

output

[{'_id': {'$oid': '60551'}, 'barcode': '511111019862', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366101024', 'topBrand': False}, {'_id': {'$oid': '601c5460be37ce2ead43755f'}, 'barcode': '511111519928', 'brandCode': 'STARBUCKS', 'category': 'Beverages', 'categoryCode': 'BEVERAGES', 'cpg': {'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}, 'name': 'Starbucks', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755d'}, 'barcode': '511111819905', 'brandCode': 'TEST BRANDCODE @1612366146176', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146176', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755a'}, 'barcode': '511111519874', 'brandCode': 'TEST BRANDCODE @1612366146051', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146051', 'topBrand': False}]

Same result with less code below

import json
with open('brands.json') as f:
    data = [json.loads(line.strip()) for line in f]
print(data)
balderman
  • 22,927
  • 7
  • 34
  • 52
0

Right, if I understood well what you are trying to do is to convert the data in the "brands.json" file into a list.

First of all when you open a file you need to read it, like this to read the lines:

with open("brands.json", 'r') as f:
    read_lines = f.readlines()

Now, to do what you want to do you can simply follow:

import json

data = []
with open("brands.json", 'r') as f:
    read_lines = f.readlines()
    for lines_of_data in read_lines:
        line_json = json.loads(lines_of_data.strip())
        data.append(line_json)

this if you want a dict with the data in it, that will look like:

[{'_id': {'$oid': '60551'}, 'barcode': '511111019862', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366101024', 'topBrand': False}, {'_id': {'$oid': '601c5460be37ce2ead43755f'}, 'barcode': '511111519928', 'brandCode': 'STARBUCKS', 'category': 'Beverages', 'categoryCode': 'BEVERAGES', 'cpg': {'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}, 'name': 'Starbucks', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755d'}, 'barcode': '511111819905', 'brandCode': 'TEST BRANDCODE @1612366146176', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146176', 'topBrand': False}, {'_id': {'$oid': '601ac142be37ce2ead43755a'}, 'barcode': '511111519874', 'brandCode': 'TEST BRANDCODE @1612366146051', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146051', 'topBrand': False}]

or you can load the data as a json, into a json (much easier to work with)

import json

data = {}
with open("brands.json", 'r') as f:
    read_lines = f.readlines()
    for lines_of_data in read_lines:
        line_json = json.loads(lines_of_data.strip())
        line_id = line_json['_id']['$oid']
        data[line_id] = line_json

in this way you will have a json with the "$oid" used as the key per each line of data, it'll look like:

{'60551': {'_id': {'$oid': '60551'}, 'barcode': '511111019862', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac114be37ce2ead437550'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366101024', 'topBrand': False}, '601c5460be37ce2ead43755f': {'_id': {'$oid': '601c5460be37ce2ead43755f'}, 'barcode': '511111519928', 'brandCode': 'STARBUCKS', 'category': 'Beverages', 'categoryCode': 'BEVERAGES', 'cpg': {'$id': {'$oid': '5332f5fbe4b03c9a25efd0ba'}, '$ref': 'Cogs'}, 'name': 'Starbucks', 'topBrand': False}, '601ac142be37ce2ead43755d': {'_id': {'$oid': '601ac142be37ce2ead43755d'}, 'barcode': '511111819905', 'brandCode': 'TEST BRANDCODE @1612366146176', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146176', 'topBrand': False}, '601ac142be37ce2ead43755a': {'_id': {'$oid': '601ac142be37ce2ead43755a'}, 'barcode': '511111519874', 'brandCode': 'TEST BRANDCODE @1612366146051', 'category': 'Baking', 'categoryCode': 'BAKING', 'cpg': {'$id': {'$oid': '601ac142be37ce2ead437559'}, '$ref': 'Cogs'}, 'name': 'test brand @1612366146051', 'topBrand': False}}

and I find json much easier to work with.

Dom
  • 11
  • 3