1

I am trying to extract data from txt file in python which is a json dump. But I am getting JSONDecode Error

This is how I am adding json response into the file

repo=requests.get(url,headers=headers,params=params).json()
if repo:
    with open('data.txt', 'a') as f:
        json.dump(repo, f, sort_keys=True, indent=4)
    continue
else:
    break

this is my json structure

[
    {
        "login": "asu",
        "login_name": "heylo"
    },
    {
        "login": "sr9",
        "login_name": "heylo"
    }
],
[
    {
        "login": "tokuda109",
        "login_name": "mojombo"
    },
    {
        "login": "svallory",
        "login_name": "mojombo"
    }
]

this is I am trying to extract

with open('data.txt') as fd:
    json_data = json.load(fd)
    pprint(json_data)
Chandella07
  • 2,089
  • 14
  • 22
Tayyab Vohra
  • 1,512
  • 3
  • 22
  • 49
  • If you're getting JSONDecodeError, then you're not loading correct JSON. – AKX Apr 29 '21 at 08:15
  • @AKX this issues occur when I am appending the data in the file, not occuring when I am writing into the file – Tayyab Vohra Apr 29 '21 at 08:17
  • 1
    By appending to the file, you've created invalid JSON. `json.load` does not allow reading concatenated JSON structures like that. You might want to consider https://jsonlines.org/ – i.e. exactly one JSON structure per line, and then loading with a loop over the lines. – AKX Apr 29 '21 at 08:19
  • could you please share requests arguments? I wanna repeat the procedure – Pouya Esmaeili Apr 29 '21 at 08:19
  • then how would i append json objects into the file – Tayyab Vohra Apr 29 '21 at 08:21
  • @TayyabGulsherVohra this will help [How to append data to a json file?](https://stackoverflow.com/questions/12994442/how-to-append-data-to-a-json-file) – deadshot Apr 29 '21 at 08:21
  • @deadshot yes, I have appended json into the txt file – Tayyab Vohra Apr 29 '21 at 08:28
  • @PouyaEsmaeili , I am extracting data from github url this is the example url https://api.github.com/users/sferik/followers?per_page=100&page=1 – Tayyab Vohra Apr 29 '21 at 08:29

2 Answers2

0

As mentioned in the comment, just concatenating JSON objects into a file one after another does not make a file that's valid JSON (and that could be parsed as a single JSON object).

The minimal better format is JSON Lines, https://jsonlines.org/ , that is a file of lines that are each a JSON document.

You can create such a file by appending to a file while ensuring indent is off when dumping JSON:

with open('data.txt', 'a') as f:
    print(json.dumps(repo, sort_keys=True), file=f)

Using print() ensures a trailing newline.

Then, you can load the data with e.g.

with open('data.txt') as fd:
    json_data = [json.loads(line) for line in fd if line.strip()]

If the current file of concatenated JSON documents is important to you, you could try and repair it with a hack like wrapping the file's contents with [ and ] and adding a comma between otherwise malformed concatenated documents, but this is not quite guaranteed to work.

with open('data.txt') as fd:
    fixed_json_data = json.loads("[" + fd.read().replace("}{", "},{") + "]")
AKX
  • 152,115
  • 15
  • 115
  • 172
  • not working json decode error expecting line 2 column 1 – Tayyab Vohra Apr 29 '21 at 08:25
  • Expecting ',' delimiter: line 2103 column 1 (char 111481] – Tayyab Vohra Apr 29 '21 at 08:26
  • i have made little changes in the json format – Tayyab Vohra Apr 29 '21 at 08:28
  • Well, if you've made little changes to the format, then I'm sure you can figure out how to parse it. It's very hard to help without seeing your actual data. – AKX Apr 29 '21 at 08:28
  • I have shared the actual data , also i have fixed_json_data = json.loads("[" + fd.read().replace("}{", "},{") + "],") or fixed_json_data = json.loads("[" + fd.read().replace("}{", "},{") + "]" + ",") with both cases its not working – Tayyab Vohra Apr 29 '21 at 08:32
  • The JSON structure you've edited into your post is still not valid JSON. You would need to wrap it in one more set of brackets for it to parse as a single array. – AKX Apr 29 '21 at 08:34
0

As described in How to append data to a json file?, using a mode is not a good choice, I think it is better to append fetched data to the available list in data.txt manually like this:

import json
import requests


def read_content():  # reads and returns the available list in data.txt
    try:
        with open('data.txt') as fd:
            json_data = json.load(fd)
    except:
        json_data = []  # handle the first write, when the file does not exist
    return json_data


url = 'https://api.github.com/users/sferik/followers?per_page=100&page=1'
repo = requests.get(url).json()  # repo is a list

if repo:
    available_content = read_content()  # read available list in data.txt
    available_content.extend(repo)  # extend new list to the end of available list
    with open('data.txt', 'w') as f:  # write again, the mode is 'w'
        json.dump(repo, f, sort_keys=True, indent=4)

Pouya Esmaeili
  • 1,265
  • 4
  • 11
  • 25