0

I have the output of a certain function looking like this (It is google analytics data, giving me feedback on the users of my website)

{'sampleRate': 1,
                               # Pageview activity
 'sessions': [{'activityTime': '2020-01-08T15:48:38.012671Z',
                               'activityType': 'PAGEVIEW',
                               'campaign': '(not set)',
                               'channelGrouping': 'Direct',
                               'customDimension': [{'index': 1}],
                               'hostname': 'company.domain.com',
                               'keyword': '(not set)',
                               'landingPagePath': '/login',
                               'medium': '(none)',
                               'pageview': {'pagePath': '/login',
                                            'pageTitle': 'titleofthepage'},
                               'source': '(direct)'},

                              # Event activity
                              {'activityTime': '2020-01-08T15:48:37.915105Z',
                               'activityType': 'EVENT',
                               'campaign': '(not set)',
                               'channelGrouping': 'Direct',
                               'customDimension': [{'index': 1}],
                               'event': {'eventAction': 'Successfully Logged '
                                                        'In',
                                         'eventCategory': 'Auth',
                                         'eventCount': '1',
                                         'eventLabel': '(not set)'},
                               'hostname': 'company.domain.com',
                               'keyword': '(not set)',
                               'landingPagePath': '/login',
                               'medium': '(none)',
                               'source': '(direct)'}]
               'dataSource': 'web',
               'deviceCategory': 'desktop',
               'platform': 'Windows',
               'sessionDate': '2020-01-08',

NOTE ABOUT ACTIVITIES: The number of activities is not always two, sometimes a user have performed 20 activities and other times they only performed 1, I have given the output as simple as possible (An activity is classified as either 'pageview' or 'event' but never both)

NOTE ABOUT ORIGINAL FUNCTION: The original function (the one giving the GA output) is being run in a for loop, for each user, all of the users' data is being stored in one big list. To be clear, the data above is only for one user.

EXPECTED OUTPUT: What I want is a database with two tables ( shown below ). I am not sure how to do this for a lot of users, this where I need help. I want to be able to sit in a meeting and give feedback on our users.

Table 1:

SessionId   User      dataSource  deviceCategory   platform      sessionDuration 
12345         123        web             desktop                windows     00:15:12
...

Table 2:

ActivityTime         pageTitle   pagePath   EventCategory   eventCount   eventLabel   eventAction
2019-12-15 20:30:12  domain      webpage    NaN             NaN          NaN          NaN
2019-12-15 20:45:47  NaN         NaN        Aut             1            (not_set)    LoggedIn

PS: I know this seems complicated, but simply: I get a list of nested dictionaries that I want to put into a database using Peewee, to be able to do queries with.

If I am misunderstanding something please let me know

I'm thinking PRAGMA can do the trick, I should then know how to store all of this data as a .db file (I think)? Thank you :)

Llewellyn Hattingh
  • 306
  • 1
  • 5
  • 16

1 Answers1

0

I'd suggest to store this data as JSON file, load this with built-it json module, then do whatever you want:

Your JSON:

{
  "sampleRate": 1,
  "sessions": [
    {
      "activityTime": "2020-01-08T15:48:38.012671Z",
      "activityType": "PAGEVIEW",
      "campaign": "(not set)",
      "channelGrouping": "Direct",
      "customDimension": [
        {
          "index": 1
        }
      ],
      "hostname": "company.domain.com",
      "keyword": "(not set)",
      "landingPagePath": "/login",
      "medium": "(none)",
      "pageview": {
        "pagePath": "/login",
        "pageTitle": "titleofthepage"
      },
      "source": "(direct)"
    },
    {
      "activityTime": "2020-01-08T15:48:37.915105Z",
      "activityType": "EVENT",
      "campaign": "(not set)",
      "channelGrouping": "Direct",
      "customDimension": [
        {
          "index": 1
        }
      ],
      "event": {
        "eventAction": "Successfully Logged In",
        "eventCategory": "Auth",
        "eventCount": "1",
        "eventLabel": "(not set)"
      },
      "hostname": "company.domain.com",
      "keyword": "(not set)",
      "landingPagePath": "/login",
      "medium": "(none)",
      "source": "(direct)"
    }
  ],
  "dataSource": "web",
  "deviceCategory": "desktop",
  "platform": "Windows",
  "sessionDate": "2020-01-08"
}

Python code:

import json
with open('data.json', 'r', encoding='utf-8') as fw:
    obj = json.load(fw)  # your nested dictionary
    # peewee mapping
    # or
    # db insertion
    # or
    # some other stuff
    pass
Raymond Reddington
  • 1,709
  • 1
  • 13
  • 21
  • Thank you. Im testing it out – Llewellyn Hattingh Jan 24 '20 at 10:08
  • I am still lost, I don't know how to tell the program to go look at this particular part of the nested dictionary within the list, make it an object, and for every iteration within the list, add the data to the database table – Llewellyn Hattingh Jan 24 '20 at 10:25
  • You should iterate through your dict, and sometimes use recursion to achieve nested values. Look at these. https://stackoverflow.com/questions/14962485/finding-a-key-recursively-in-a-dictionary https://stackoverflow.com/questions/3294889/iterating-over-dictionaries-using-for-loops – Raymond Reddington Jan 24 '20 at 10:44