0

I have a set of ndJOSN dataset like the below:

   {'ADDRESS_CITY': 'Whittier', 'ADDRESS_LINE_1': '905 Greenleaf Avenue', 'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '90402',},
   {'ADDRESS_CITY': 'Cedar Falls', 'ADDRESS_LINE_1': '93323 Maplewood Dr', 'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '95014'}

I need to pass values from above into an api request, specifically the body in the format below.

data=[
        {
            "addressee":"Greenleaf Avenue",
            "street":"905 Greenleaf Avenue",
            "city":"Whittier",
            "state":"CA",
            "zipcode":"90402",
            
        },
        {
            "addressee":"93323",
            "street":"Maplewood Dr",
            "city":"Cedar Falls",
            "state":"CA",
            "zipcode":"95014",
        
        }
]

As you can see, the Key's are different so I need to change the Key's to align with the correct data and pass them in with the new key names (ie address_line_1 goes to addressee) - and there are going to be 10k addresses in this request.

I did not note it in my first example, but there is an ID associated with each address - I have to remove to make the request,and then add back in. So I ended up solving with the below - anything more pythonic, these feels not so eloquent to me...?

addresses = ndjson.loads(addresses)
data = json.loads(json.dumps(addresses).replace('"ADDRESS_CITY"','"city"').replace('"ADDRESS_LINE_1"','"street"').replace('"ADDRESS_STATE"','"state"').replace('"ADDRESS_ZIP"','"zipcode"'))
ids = []
for i in data:
    i['candidates'] = 1
    ids.append(i["ID"])
    del i["ID"]

response = requests.request("POST", url, json=data)

resp_data = response.json()

a = 0
for i in resp_data:
    i['ID'] = ids[a]
    x = i['ID'] = ids[a]
    a = a + 1
0004
  • 1,156
  • 1
  • 14
  • 49
  • hmm, I’m not familiar with ndjson actually. wondering what exactly is it, and under what scenarios would I use it. – rv.kvetch Sep 30 '21 at 00:53
  • Also, is the candidates field required? I noticed you’re setting it to 1 for all the data. – rv.kvetch Sep 30 '21 at 01:05
  • yes, because I only want one response as I have to associate a unique ID to it (dont know a better way), but if I get many, how will i know what initiall address it was associated to – 0004 Sep 30 '21 at 01:08
  • Hmm, I think I understand. So in the dataset, there could be duplicate addresses, so you use the id field to uniquely identify them. – rv.kvetch Sep 30 '21 at 01:53
  • However, I guess I’m just a bit confused about why you’re replacing id with candidate for all the data. is that because the api only accepts a candidate field? also, I didn’t understand why you set `candidate=1` for each of the data - that seems like a sort of boolean flag to me, only it’s the same for each of the data, unless I’m reading that wrong. – rv.kvetch Sep 30 '21 at 01:56
  • 1
    @rv.ketch candidate=1, it can be up to 10, it means if the input is unclear (ie you put an adress with no state into the api, and there can be multliple addresses in USA, it only retruns 1 opposed to all of them) It i is not boolean – 0004 Sep 30 '21 at 16:30
  • ah, I see. thanks for clarifying - that makes sense to me now. – rv.kvetch Sep 30 '21 at 18:10

2 Answers2

2

If you want to make things a bit easier for yourself, I would suggest using data classes to model your input data. The main benefit of this is that you can use dot . access for attributes, and you don't need to work with dictionaries which have dynamic keys. You also benefit from type hinting, so your IDE should be able to better assist you as well.

In this case, I would suggest pairing it with a JSON serialization library such as the dataclass-wizard, which actually supports this use case perfectly. As of the latest version - v0.15.0, it should also support excluding fields from the serialization / dump process.

Here is a straightforward example that I put together, which uses the desired key mapping from above:

import json
from dataclasses import dataclass, field
# note: for python 3.9+, you can import this from `typing` instead
from typing_extensions import Annotated

from dataclass_wizard import JSONWizard, json_key


@dataclass
class AddressInfo(JSONWizard):
    """
    AddressInfo dataclass

    """
    city: Annotated[str, json_key('ADDRESS_CITY')]
    street: Annotated[str, json_key('ADDRESS_LINE_1')]
    state: Annotated[str, json_key('ADDRESS_STATE')]

    # pass `dump=False`, so we exclude the field in serialization.
    id: Annotated[int, json_key('ID', dump=False)]

    # you could also annotate the below like `Union[str, int]`
    # if you want to retain it as a string.
    zipcode: Annotated[int, json_key('ADDRESS_ZIP')]

    # exclude this field from the constructor (and from the
    # de-serialization process)
    candidates: int = field(default=1, init=False)

And sample usage of the above:

input_obj = [{'ADDRESS_CITY': 'Whittier', 'ADDRESS_LINE_1': '905 Greenleaf Avenue',
              'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '90402',
              'ID': 111},
             {'ADDRESS_CITY': 'Cedar Falls', 'ADDRESS_LINE_1': '93323 Maplewood Dr',
              'ADDRESS_STATE': 'CA', 'ADDRESS_ZIP': '95014',
              'ID': 222}]

addresses = AddressInfo.from_list(input_obj)

print('-- Addresses')
for a in addresses:
    print(repr(a))

out_list = [a.to_dict() for a in addresses]

print('-- To JSON')
print(json.dumps(out_list, indent=2))

# alternatively, with the latest version (0.15.1)
# print(AddressInfo.list_to_json(addresses, indent=2))

Note: you can still access the id for each address as normal, even though this field is omitted from the JSON result.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
1

Use a dictionary to translate them:

translations = {
"ADDRESS_CITY": "city"} # etc
input_data = ... # your data here
data = [{translations[k]: v for k, v in row.items()} for row in input_data]
2e0byo
  • 5,305
  • 1
  • 6
  • 26
  • change the ndjson file to a dictionary? The interate through the keys so change them? – 0004 Sep 29 '21 at 22:27
  • made some changes – 0004 Sep 29 '21 at 23:49
  • Thank you - I got to an answer that works actually - any input if my route is more or less pythonic, or a better way I could do it> – 0004 Sep 30 '21 at 00:39
  • @0004 the dataclass approach is more robust; you should probably go with that. If you can JSON.dumps your njson object it's *already* dict like. Whilst str replacing will *work*, it's a hack... – 2e0byo Sep 30 '21 at 09:01