2

The following json (otherwise known as 'Cus_data') is what I am working on deciphering with the pandas.io.json.json_normalize package.

I can get within the json at a base level by

cus_data = json_normalize(cus_data, 'data')

or

cus_data = json_normalize(cus_data['data'], max_level=1)

....Not sure which is best practice

I can't seem to figure out how to enter "Sources", "subscriptions","plan" or any sub categories. I'm sure this is basic but I cannot seem to grasp the concept of drilling down using this package under pandas. Any input or tips would be great on how to look at json from the outside like this.

    {
  "data": [
    {
      "account_balance": 0,
      "address": null,
      "balance": 0,
      "created": 1589322628,
      "currency": "usd",
      "default_source": "card_1Gi6cGHdG7kiS0bLynjuRExn",
      "delinquent": false,
      "description": null,
      "discount": null,
      "email": "test1111@gmail.com",
      "id": "cus_HGduaDYZxYsNep",
      "invoice_prefix": "2469C675",
      "invoice_settings": {
        "custom_fields": null,
        "default_payment_method": null,
        "footer": null
      },
      "livemode": false,
      "metadata": {},
      "name": null,
      "next_invoice_sequence": 2,
      "object": "customer",
      "phone": null,
      "preferred_locales": [],
      "shipping": null,
      "sources": {
        "data": [
          {
            "address_city": null,
            "address_country": null,
            "address_line1": null,
            "address_line1_check": null,
            "address_line2": null,
            "address_state": null,
            "address_zip": null,
            "address_zip_check": null,
            "brand": "Visa",
            "country": "US",
            "customer": "cus_HGduaDYZxYsNep",
            "cvc_check": "unchecked",
            "dynamic_last4": null,
            "exp_month": 5,
            "exp_year": 2021,
            "fingerprint": "fEOTy80zKG0YbTwA",
            "funding": "credit",
            "id": "card_1Gi6cGHdG7kiS0bLynjuRExn",
            "last4": "5126",
            "metadata": {},
            "name": "test1111@gmail.com",
            "object": "card",
            "tokenization_method": null
          }
        ],
        "has_more": false,
        "object": "list",
        "total_count": 1,
        "url": "/v1/customers/cus_HGduaDYZxYsNep/sources"
      },
      "subscriptions": {
        "data": [
          {
            "application_fee_percent": null,
            "billing": "charge_automatically",
            "billing_cycle_anchor": 1589322628,
            "billing_thresholds": null,
            "cancel_at": null,
            "cancel_at_period_end": false,
            "canceled_at": null,
            "collection_method": "charge_automatically",
            "created": 1589322628,
            "current_period_end": 1592001028,
            "current_period_start": 1589322628,
            "customer": "cus_HGduaDYZxYsNep",
            "days_until_due": null,
            "default_payment_method": null,
            "default_source": null,
            "default_tax_rates": [],
            "discount": null,
            "ended_at": null,
            "id": "sub_HGducAJ2Ox9H35",
            "invoice_customer_balance_settings": {
              "consume_applied_balance_on_void": true
            },
            "items": {
              "data": [
                {
                  "billing_thresholds": null,
                  "created": 1589322628,
                  "id": "si_HGduoRX8OyZSOj",
                  "metadata": {},
                  "object": "subscription_item",
                  "plan": {
                    "active": true,
                    "aggregate_usage": null,
                    "amount": 1999,
                    "amount_decimal": "1999",
                    "billing_scheme": "per_unit",
                    "created": 1589253059,
                    "currency": "usd",
                    "id": "plan_HGLCsqbTMnWJJU",
                    "interval": "month",
                    "interval_count": 1,
                    "livemode": false,
                    "metadata": {},
                    "nickname": "Monthly",
                    "object": "plan",
                    "product": "prod_HGLAvChw1qUMHd",
                    "tiers": null,
                    "tiers_mode": null,
                    "transform_usage": null,
                    "trial_period_days": 7,
                    "usage_type": "licensed"
                  },
                  "price": {
                    "active": true,
                    "billing_scheme": "per_unit",
                    "created": 1589253059,
                    "currency": "usd",
                    "id": "plan_HGLCsqbTMnWJJU",
                    "livemode": false,
                    "lookup_key": null,
                    "metadata": {},
                    "nickname": "Monthly",
                    "object": "price",
                    "product": "prod_HGLAvChw1qUMHd",
                    "recurring": {
                      "aggregate_usage": null,
                      "interval": "month",
                      "interval_count": 1,
                      "trial_period_days": 7,
                      "usage_type": "licensed"
                    },
                    "tiers": null,
                    "tiers_mode": null,
                    "transform_quantity": null,
                    "type": "recurring",
                    "unit_amount": 1999,
                    "unit_amount_decimal": "1999"
                  },
                  "quantity": 1,
                  "subscription": "sub_HGducAJ2Ox9H35",
                  "tax_rates": []
                }
              ],
              "has_more": false,
              "object": "list",
              "total_count": 1,
              "url": "/v1/subscription_items?subscription=sub_HGducAJ2Ox9H35"
            },
            "latest_invoice": "in_1Gi6cKHdG7kiS0bLVxWHYMMW",
            "livemode": false,
            "metadata": {},
            "next_pending_invoice_item_invoice": null,
            "object": "subscription",
            "pause_collection": null,
            "pending_invoice_item_interval": null,
            "pending_setup_intent": null,
            "pending_update": null,
            "plan": {
              "active": true,
              "aggregate_usage": null,
              "amount": 1999,
              "amount_decimal": "1999",
              "billing_scheme": "per_unit",
              "created": 1589253059,
              "currency": "usd",
              "id": "plan_HGLCsqbTMnWJJU",
              "interval": "month",
              "interval_count": 1,
              "livemode": false,
              "metadata": {},
              "nickname": "Monthly",
              "object": "plan",
              "product": "prod_HGLAvChw1qUMHd",
              "tiers": null,
              "tiers_mode": null,
              "transform_usage": null,
              "trial_period_days": 7,
              "usage_type": "licensed"
            },
            "quantity": 1,
            "schedule": null,
            "start": 1589322628,
            "start_date": 1589322628,
            "status": "active",
            "tax_percent": null,
            "trial_end": null,
            "trial_start": null
          }
        ],
        "has_more": false,
        "object": "list",
        "total_count": 1,
        "url": "/v1/customers/cus_HGduaDYZxYsNep/subscriptions"
      },
      "tax_exempt": "none",
      "tax_ids": {},
      "tax_info": null,
      "tax_info_verification": null
    }
  ],
  "has_more": true,
  "object": "list",
  "url": "/v1/customers"
}

current output

  id    object  account_balance address  balance     created  \
0  cus_HGduaDYZxYsNep  customer                0    None        0  1589322628

  currency                 default_source  delinquent description discount  \
0      usd  card_1Gi6cGHdG7kiS0bLynjuRExn       False        None     None

                email invoice_prefix  livemode  name  next_invoice_sequence  \
0  test1111@gmail.com       2469C675     False  None                      2

  phone preferred_locales shipping tax_exempt tax_info tax_info_verification  \
0  None                []     None       none     None                  None

  invoice_settings.custom_fields invoice_settings.default_payment_method  \
0                           None                                    None

  invoice_settings.footer sources.object  \
0                    None           list

                                        sources.data  sources.has_more  \
0  [{'id': 'card_1Gi6cGHdG7kiS0bLynjuRExn', 'obje...             False

   sources.total_count                               sources.url  \
0                    1  /v1/customers/cus_HGduaDYZxYsNep/sources

  subscriptions.object                                 subscriptions.data  \
0                 list  [{'id': 'sub_HGducAJ2Ox9H35', 'object': 'subsc...

   subscriptions.has_more  subscriptions.total_count  \
0                   False                          1

                                subscriptions.url tax_ids.object tax_ids.data  \
0  /v1/customers/cus_HGduaDYZxYsNep/subscriptions           list           []

   tax_ids.has_more  tax_ids.total_count  \
0             False                    0

                                tax_ids.url
0  /v1/customers/cus_HGduaDYZxYsNep/tax_ids
  • could u reformat ur data, ... getting unexpected EOF while parsing – sammywemmy May 13 '20 at 00:00
  • just needs a trailing `]}`. Edit is too small for so edit rules... – Michael Delgado May 13 '20 at 00:17
  • @sammywemmy okay, I reformatted it properly – rossinbossin May 13 '20 at 00:18
  • @MichaelDelgado yep, that's what I added. I feel so incompetent trying to figure this out.. maybe this package as a tool is making it harder than it needs to be? converting this json data to a pandas dataframe does help with easier manipulation and it doesn't use much resources to do it so if I can figure this out it would be awesome. – rossinbossin May 13 '20 at 00:20
  • Does your sample represent one entry among others which you're trying to convert into a table? If this is the entire dataset than it's definitely the case that pandas is not the right tool for the job. I would check out [this answer](https://stackoverflow.com/a/19491742/3888719) for tips on exploring json. If it is one record among many, can you modify your question to clearly show where the other records would fit into your data and what data you want to extract? – Michael Delgado May 13 '20 at 00:51
  • @MichaelDelgado I just pulled one data point for this example. This is customer data from stripe and I have to normalize this json bc the data point thats important is the customers email as thats the primary key between my login info and customer info. The importance is being able to verify payments related to whos logged in. – rossinbossin May 13 '20 at 01:19
  • cool. having a look at it. quick one. could u post an expected output? how do u want ur dataframe to look like? – sammywemmy May 13 '20 at 01:19
  • @sammywemmy I updated the question showing how it returns with "cus_data = json_normalize(cus_data['data'], max_level=1)" – rossinbossin May 13 '20 at 01:23
  • What we need though is what you actually want the data to look like. Can you include two entries so we understand the record structure, and list exactly which fields you want to include in your output? – Michael Delgado May 13 '20 at 01:32
  • @MichaelDelgado so when I normalize the json calling "data" it doesn't open further arrays, it just opens the entire data point. I want to jump into the"subscriptions" array and pull that data. – rossinbossin May 13 '20 at 01:37
  • @MichaelDelgado I just can't figure out how to call to that level within the json. – rossinbossin May 13 '20 at 01:39
  • @sammywemmy ...damn it was just ""json_normalize(cus_data['data'],'subscriptions')"" that took me into the array. Heck yeah. I definitely tried that too... Guess I learned the same leason I've learned 100X...slow.down. – rossinbossin May 13 '20 at 02:15
  • Cool - if that's what you were looking for go ahead and post what you did, along with the output, and accept your answer. nice digging! – Michael Delgado May 13 '20 at 15:43

2 Answers2

1

give jmespath a whirl; it can help with some intricate nested data.

Key takeaways : if it is a dict, u can access it with the . notation; if it is an array/list, u access it with the [] notation. it could be [*] or []. u can read the docs for more on that.

the first path is a dict(data), which leads to an array, which contains a dict... our end point is the subscriptions array :

dict-> array ->dict - >array

actual code :

import jmespath
expression = jmespath.compile('data[].subscriptions[]')
expression.search(content)

the subscriptions array is nested. so u need to know exactly what u want, and then adapt to it. if u know exactly how ur data should look, or still have issues, just post ur expected output, and where u got stuck, and someone should help out

sammywemmy
  • 27,093
  • 4
  • 17
  • 31
0

The following got me into the nested array.

from pandas.io.json import json_normalize

stripe.api_key = "tes_api###############"

cus_data = stripe.Customer.list(limit=1)

cus_data = json_normalize(cus_data['data'],'subscriptions')

print(cus_data)