-2

I'm sure this question has been asked a million times already. I have read through some others and am struggling to find an answer.

I am querying the RIPE api en masse, using the following curl command in Debian 9:


file="servers-to-ripe.txt"
while IFS= read -r line
do
# Hostnames -> corresponding IPs
  dig=$(./ip_extrapolate2 $line| grep -v $resolving_server)
  curl --silent  "https://stat.ripe.net/data/address-space-usage/data.json?resource="$dig"&data=asn_name" >> servers.json
done <"$file"

This gives me some JSON output, pertaining to the ownership of said servers. I initially used the jq CLI parser, to no avail.

Thus leading me to write it in Python instead. Here are the first two objects from the list:

{
    "status": "ok", 
    "server_id": "app002", 
    "status_code": 200, 
    "version": "0.4", 
    "cached": false, 
    "see_also": [], 
    "time": "2020-01-18T02:44:39.610258", 
    "messages": [
        [
            "info", 
            "IP address (185.230.125.107) has been changed to the closest encompassing prefix/range (185.230.125.0/24) found in RIPE DB"
        ]
    ], 
    "data_call_status": "supported - connecting to ursa", 
    "process_time": 216, 
    "build_version": "2020.1.13.174", 
    "query_id": "20200118024439-c225c628-6317-430d-8244-64f805701675", 
    "data": {
        "assignments": [], 
        "query_time": "2020-01-16T00:00:00", 
        "ip_stats": [
            {
                "status": "LIR Free", 
                "ips": 256
            }
        ], 
        "resource": "185.230.125.0/24", 
        "allocations": [
            {
                "allocation": "185.230.124.0/22", 
                "status": "ALLOCATED PA", 
                "asn_name": "RO-M247EUROPE-OCT-20171108", 
                "assignments": 0
            }
        ]
    }
}{
    "status": "ok", 
    "server_id": "app018", 
    "status_code": 200, 
    "version": "0.4", 
    "cached": false, 
    "see_also": [], 
    "time": "2020-01-18T02:44:40.104775", 
    "messages": [
        [
            "info", 
            "IP address (45.9.249.67) has been changed to the closest encompassing prefix/range (45.9.249.0/24) found in RIPE DB"
        ]
    ], 
    "data_call_status": "supported - connecting to ursa", 
    "process_time": 180, 
    "build_version": "2020.1.13.174", 
    "query_id": "20200118024439-33ce2ee1-33a2-42c2-8d9e-acbc92996fe5", 
    "data": {
        "assignments": [
            {
                "status": "ASSIGNED PA", 
                "parent_allocation": "45.9.248.0/22", 
                "address_range": "45.9.249.0/24", 
                "asn_name": "M247-Dubai"
            }
        ], 
        "query_time": "2020-01-16T00:00:00", 
        "ip_stats": [
            {
                "status": "ASSIGNED PA", 
                "ips": 256
            }
        ], 
        "resource": "45.9.249.0/24", 
        "allocations": [
            {
                "allocation": "45.9.248.0/22", 
                "status": "ALLOCATED PA", 
                "asn_name": "RO-M247-APR1901-20190423", 
                "assignments": 1
            }
        ]
    }
}{

I am trying to pull ONLY the asn_name and the IP-range.

I have tinkered with Python (2.7)'s inbuilt json parser. Here's what I've tried:

#!/usr/bin/python
import json

input_file = open ('servers.json')
json_array = json.load(input_file)
servers = []

for item in json_array:
  server_asn_name = {"asn":None, "resource":None}
  server_asn_name['asn'] = item['asn_name']
  server_asn_name['resource'] = item["resource"]
  servers.append(server_asn_name)

print(server_asn_name)

There's a few others, but that's probably the closest I've gotten so far. Any advice would be much appreciated :)

Lewis Farnworth
  • 265
  • 2
  • 8
  • Note that once you have loaded a JSON, it is a regular ``dict``, ``list`` and/or primitive type. Consequently, you look up data as in manually created lists or dicts. Looking at how to extract data from a JSON is a red herring. – MisterMiyagi Jan 18 '20 at 15:54
  • Can you clarify which version of Python you're using? I'm also not sure what exactly the issue is here. – AMC Jan 18 '20 at 17:08
  • It's Python 2.7.13, on Debian 9 – Lewis Farnworth Jan 19 '20 at 15:26

1 Answers1

1

Your json file looks like this, assuming the file name is: servers.json

[
  {
    "status": "ok",
    "server_id": "app002",
    "status_code": 200,
    "version": "0.4",
    "cached": false,
    "see_also": [],
    "time": "2020-01-18T02:44:39.610258",
    "messages": [
      [
        "info",
        "IP address (185.230.125.107) has been changed to the closest encompassing prefix/range (185.230.125.0/24) found in RIPE DB"
      ]
    ],
    "data_call_status": "supported - connecting to ursa",
    "process_time": 216,
    "build_version": "2020.1.13.174",
    "query_id": "20200118024439-c225c628-6317-430d-8244-64f805701675",
    "data": {
      "assignments": [],
      "query_time": "2020-01-16T00:00:00",
      "ip_stats": [
        {
          "status": "LIR Free",
          "ips": 256
        }
      ],
      "resource": "185.230.125.0/24",
      "allocations": [
        {
          "allocation": "185.230.124.0/22",
          "status": "ALLOCATED PA",
          "asn_name": "RO-M247EUROPE-OCT-20171108",
          "assignments": 0
        }
      ]
    }
  },
  {
    "status": "ok",
    "server_id": "app018",
    "status_code": 200,
    "version": "0.4",
    "cached": false,
    "see_also": [],
    "time": "2020-01-18T02:44:40.104775",
    "messages": [
      [
        "info",
        "IP address (45.9.249.67) has been changed to the closest encompassing prefix/range (45.9.249.0/24) found in RIPE DB"
      ]
    ],
    "data_call_status": "supported - connecting to ursa",
    "process_time": 180,
    "build_version": "2020.1.13.174",
    "query_id": "20200118024439-33ce2ee1-33a2-42c2-8d9e-acbc92996fe5",
    "data": {
      "assignments": [
        {
          "status": "ASSIGNED PA",
          "parent_allocation": "45.9.248.0/22",
          "address_range": "45.9.249.0/24",
          "asn_name": "M247-Dubai"
        }
      ],
      "query_time": "2020-01-16T00:00:00",
      "ip_stats": [
        {
          "status": "ASSIGNED PA",
          "ips": 256
        }
      ],
      "resource": "45.9.249.0/24",
      "allocations": [
        {
          "allocation": "45.9.248.0/22",
          "status": "ALLOCATED PA",
          "asn_name": "RO-M247-APR1901-20190423",
          "assignments": 1
        }
      ]
    }
  }
]

Create a new function called servers_from_json, which takes the file_name as parameter, and the function will return a list of server with only ip and asn field that you want, as showed followed:

import json


def servers_from_json(file_name):
    with open(file_name, 'r') as f:
        data = json.loads(f.read())
        servers = [{'asn': item['data']['resource'], 'resource': item['data']['allocations'][0]['asn_name']} for item in data]
        return servers


servers = servers_from_json('servers.json')
print(servers) # => [{'asn': '185.230.125.0/24', 'resource': 'RO-M247EUROPE-OCT-20171108'}, {'asn': '45.9.249.0/24', 'resource': 'RO-M247-APR1901-20190423'}]

Should give you the correct result

Limboer
  • 373
  • 4
  • 24
  • Forgive me for sounding stupid- This is the first time I've had to deal with JSON data ever. Normally just build everything in house with non-standardized output. So, when you say that it can be simplified, will that have to be done manually? Am I going to have to hand sift through all 5570 entries, or is there something really obvious which I'm missing out on. – Lewis Farnworth Jan 19 '20 at 15:28
  • I trie to just straight up swiped that line you provided and I'm running into more errors - most notably, raise ValueError(errmsg("Extra data", s, end, len(s))) – Lewis Farnworth Jan 19 '20 at 15:37
  • sorry english is not my first language, the word "simplified" i mean "your json file structure could be considered as...". Answer updated as well. – Limboer Jan 19 '20 at 16:56
  • Thank you for your help... However, bad news. So, when I run the exact script you provided on the entire dataset, I receive the following error messages: – Lewis Farnworth Jan 19 '20 at 20:50
  • Traceback (most recent call last): File "./parse-test.py", line 12, in servers = servers_from_json('servers.json') File "./parse-test.py", line 7, in servers_from_json data = json.loads(f.read()) File "/usr/lib/python2.7/json/__init__.py", line 339, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 367, in decode raise ValueError(errmsg("Extra data", s, end, len(s))) ValueError: Extra data: line 38 column 2 - line 258701 column 1 (char 1098 - 8002430) – Lewis Farnworth Jan 19 '20 at 20:51
  • But, when I run it on the concatenated data set I provided you with, I get this: Traceback (most recent call last): File "./parse-test.py", line 12, in servers = servers_from_json('2servers.json') File "./parse-test.py", line 8, in servers_from_json servers = [{'asn': item['data']['resource'], 'resource': item['data']['allocations'][0]['asn_name']} for item in data] – Lewis Farnworth Jan 19 '20 at 20:52
  • Any ideas please? :) – Lewis Farnworth Jan 19 '20 at 20:52
  • I tested on my computer and it works fine, maybe its because you are using python2. Could you please install python3 and then run this script again? Python2 will no longer be maintained since this year. – Limboer Jan 20 '20 at 06:27
  • Thank you, but again... no luck. I've updated to Python 3.7 but the script is still throwing me some exceptions. – Lewis Farnworth Jan 20 '20 at 13:12
  • root@big-db:/var/projects/nord/python# ./parse-test.py Traceback (most recent call last): File "./parse-test.py", line 12, in servers = servers_from_json('2servers.json') File "./parse-test.py", line 7, in servers_from_json data = json.loads(f.read()) File "/usr/local/lib/python3.7/json/__init__.py", line 348, in loads return _default_decoder.decode(s) File "/usr/local/lib/python3.7/json/decoder.py", line 340, in decode raise JSONDecodeError("Extra data", s, end) json.decoder.JSONDecodeError: Extra data: line 38 column 2 (char 1098) – Lewis Farnworth Jan 20 '20 at 13:12
  • Ok, check your `servers.json` file and see if it is the same file format as mine that i provided to you. https://stackoverflow.com/questions/51919698/cant-parse-json-file-json-decoder-jsondecodeerror-extra-data – Limboer Jan 20 '20 at 13:19
  • 1
    okay, that's definitely progress. So, I swiped the exact format which you provided and it works. Thank you so much for your patience here. The problem is most definitely the formatting. I'm going to have to tinker a little here. The data set which I have doesn't follow the same formatting. The first thing which I have isolated is that each object was not delimited by a comma. I ran a find-replace on the entire data set to implement a comma in between each. This doesn't seem to have resolved, but I definitely have more direction here. – Lewis Farnworth Jan 20 '20 at 14:26
  • Good job, check the json format as well. I personally think the problem is not about programming with python, but is what is JSON's structure. The file you originally provided at the beginning is not a correct json format file. Once you fixed this issue i think you can get what you want. – Limboer Jan 20 '20 at 14:39
  • 2
    Excellent, thank you so much for your help here. I'll mark this as resolved, as I definitely have the tools I need to finish this by myself. Thank you! – Lewis Farnworth Jan 20 '20 at 15:31