0

I have a huge text file with blockchain data that I'd like to parse so that I can get the info from the fields I need. I have tried to convert it to json but it says is invalid. After doing some thought, I've realised it is not the best way since I only want 2 or 3 fields. Can someone help me to find the best way of extracting data from the file? There's an example below. I would only want txid size, and hash.

{
    "txid": "254d5cc8d2b1889a2cb45f7e3dca8ed53a3fcfa32e8b9eac5f68c4f09e7af7bd",
    "hash": "a8e125eb6d7ab883177d8ab228a3d09c1733d1ca49b7b2dff4b057eeb80ff9be",
    "version": 2,
    "size": 171,
    "vsize": 144,
    "weight": 576,
    "locktime": 0,
    "vin": [
        {
        "coinbase": "02ee170101",
        "sequence": 4294967295
        }
    ],
    "vout": [
            {
            "value": 12.00000000,
            "n": 0,
            "scriptPubKey": {
                "asm": "OP_HASH160 cd5b833dd43bc60b8c28c4065af670f283a203ff OP_EQUAL",
                "hex": "a914cd5b833dd43bc60b8c28c4065af670f283a203ff87",
                "reqSigs": 1,
                "type": "scripthash",
                "addresses": [
                "2NBy4928yJakYBFQuXxXBwXjsLCRWgzyiGm"
                ]
            }
        },
        {
        "value": 5.00000000,
        "n": 1,
        "scriptPubKey": {
                "asm": "OP_HASH160 cd5b833dd43bc60b8c28c4065af670f283a203ff OP_EQUAL",
                "hex": "a914cd5b833dd43bc60b8c28c4065af670f283a203ff87",
                "reqSigs": 1,
                "type": "scripthash",
                "addresses": [
                "2NBy4928yJakYBFQuXxXBwXjsLCRWgzyiGm"
                ]
            }
        }
    ],
    "hex":
    "020000000001010000000000000000000000000000000000000000000000000000000000000000
    ffffffff0502ee170101ffffffff02000000000000000017a914cd5b833dd43bc60b8c28c4065af670f283a
    203ff870000000000000000266a24aa21a9ede2f61c3f71d1defd3fa999dfa36953755c69068979996
    2b48bebd836974e8cf9012000000000000000000000000000000000000000000000000000000000
    0000000000000000",
    "blockhash": "0f84abb78891a4b9e8bc9637ec5fb8b4962c7fe46092fae99e9d69373bf7812a",
    "confirmations": 1,
    "time": 1590830080,
    "blocktime": 1590830080
}

Thank you

js352
  • 364
  • 2
  • 9
  • 2
    The JSON in the question is invalid because of the way in which the text for the "hex" field is split across multiple lines. You can see this by using a JSON [validator](https://jsonlint.com/). – andrewJames Jun 05 '20 at 17:55

1 Answers1

1

@andrewjames is correct. If you have no control over the JSON file, you can address the error by just removing the newline characters:

parsed = json.loads(jsonText.replace("\n", ""))

Then you can access the fields you want like a normal dictionary:

print(parsed['txid'])
jdaz
  • 5,964
  • 2
  • 22
  • 34
  • Hi jdaz, thank you for your answer it almost worked. It worked for the example given above but when using it in the huge file I have it gives me this error: `raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 1 column 1988 (char 1987)` My code is: `transactions1 = open("Data/txs.json").read().replace("\n", "") parsed = json.loads(transactions1)` Any idea how to get around that? Thank you – js352 Jun 05 '20 at 19:27
  • 1
    Sounds like you have multiple JSON objects in your file. See here: https://stackoverflow.com/questions/21058935/python-json-loads-shows-valueerror-extra-data/51830719 – jdaz Jun 05 '20 at 20:08