0

I am using the pyflightdata library to search for flight stats. It returns json inside a list of dicts.

Here is an example of the first dictionary in the list after my query:

> flightlog = {'identification': {'number': {'default': 'KE504', 'alternative': 'None'}, 'callsign': 'KAL504', 'codeshare': 'None'}
, 'status': {'live': False, 'text': 'Landed 22:29', 'estimated': 'None', 'ambiguous': False, 'generic': {'status': {'text': 'landed', 'type': 'arrival', 'color': 'green', 'diverted': 'None'}
, 'eventTime': {'utc_millis': 1604611778000, 'utc_date': '20201105', 'utc_time': '2229', 'utc': 1604611778, 'local_millis': 1604615378000, 'local_date': '20201105', 'local_time': '2329', 'local': 1604615378}}}
, 'aircraft': {'model': {'code': 'B77L', 'text': 'Boeing 777-FEZ'}, 'registration': 'HL8075', 'country': {'name': 'South Korea', 'alpha2': 'KR', 'alpha3': 'KOR'}}
, 'airline': {'name': 'Korean Air', 'code': {'iata': 'KE', 'icao': 'KAL'}}
, 'airport': {'origin': {'name': 'London Heathrow Airport', 'code': {'iata': 'LHR', 'icao': 'EGLL'}, 'position': {'latitude': 51.471626, 'longitude': -0.467081, 'country': {'name': 'United Kingdom', 'code': 'GB'}, 'region': {'city': 'London'}}
, 'timezone': {'name': 'Europe/London', 'offset': 0, 'abbr': 'GMT', 'abbrName': 'Greenwich Mean Time', 'isDst': False}}, 'destination': {'name': 'Paris Charles de Gaulle Airport', 'code': {'iata': 'CDG', 'icao': 'LFPG'}, 'position': {'latitude': 49.012516, 'longitude': 2.555752, 'country': {'name': 'France', 'code': 'FR'}, 'region': {'city': 'Paris'}}, 'timezone': {'name': 'Europe/Paris', 'offset': 3600, 'abbr': 'CET', 'abbrName': 'Central European Time', 'isDst': False}}, 'real': 'None'}
, 'time': {'scheduled': {'departure_millis': 1604607300000, 'departure_date': '20201105', 'departure_time': '2115', 'departure': 1604607300, 'arrival_millis': 1604612700000, 'arrival_date': '20201105', 'arrival_time': '2245', 'arrival': 1604612700}, 'real': {'departure_millis': 1604609079000, 'departure_date': '20201105', 'departure_time': '2144', 'departure': 1604609079, 'arrival_millis': 1604611778000, 'arrival_date': '20201105', 'arrival_time': '2229', 'arrival': 1604611778}, 'estimated': {'departure': 'None', 'arrival': 'None'}, 'other': {'eta_millis': 1604611778000, 'eta_date': '20201105', 'eta_time': '2229', 'eta': 1604611778}}}

This dictionary is a huge, multi-nested, json mess and I am struggling to find a way to make it readable. I guess something like this:

 identification     number      default                 KE504
                                alternative             None
                    callsign                            KAL504
                    codeshare                           None

 status             live                                False
                    text                                Landed 22:29
                    Estimated                           None
                    ambiguous                           False
...

I am trying to turn it into a pandas DataFrame, with mixed results.

In this post it was explained that MultiIndex values have to be tuples, not dictionaries, so I used their example to convert my dictionary:

> flightlog_tuple = {(outerKey, innerKey): values for outerKey, innerDict in flightlog.items() for innerKey, values in innerDict.items()}

Which worked, up to a certain point.

df2 = pd.Series(flightlog_tuple)

gives the following output:

identification  number                {'default': 'KE504', 'alternative': 'None'}
                callsign                                                   KAL504
                codeshare                                                    None
status          live                                                        False
                text                                                 Landed 22:29
                estimated                                                    None
                ambiguous                                                   False
                generic         {'status': {'text': 'landed', 'type': 'arrival...
aircraft        model                  {'code': 'B77L', 'text': 'Boeing 777-FEZ'}
                registration                                               HL8075
                country         {'name': 'South Korea', 'alpha2': 'KR', 'alpha...
airline         name                                                   Korean Air
                code                                {'iata': 'KE', 'icao': 'KAL'}
airport         origin          {'name': 'London Heathrow Airport', 'code': {'...
                destination     {'name': 'Paris Charles de Gaulle Airport', 'c...
                real                                                         None
time            scheduled       {'departure_millis': 1604607300000, 'departure...
                real            {'departure_millis': 1604609079000, 'departure...
                estimated                {'departure': 'None', 'arrival': 'None'}
                other           {'eta_millis': 1604611778000, 'eta_date': '202...
dtype: object

Kind of what I was going for but some of the indexes are still in the column with values because there are so many levels. So I followed this explanation and tried to add more levels:

level_up = {(level1Key, level2Key, level3Key): values for level1Key, level2Dict in flightlog.items() for level2Key, level3Dict in level2Dict.items() for level3Key, values in level3Dict.items()}
df2 = pd.Series(level_up)

This code gives me AttributeError: 'str' object has no attribute 'items'. I don't understand why the first 2 indexes worked, but the others give an error.

I've tried other methods like MultiIndex.from_tuple or DataFrame.from_dict, but I can't get it to work.

This Dictionary is too complex as a beginner. I don't know what the right approach is. Maybe I am using DataFrames in the wrong way. Maybe there is an easier way to access the data that I am overlooking.

Any help would be much appreciated!

  • what you have is many `tables` in a single json you need to logically split them out and relate them back to each other using relevant keys to match them up, something like this would be a good starting point `dfs = {k : pd.json_normalize(v) for k,v in flightlog.items()}` – Umar.H Nov 06 '20 at 15:34
  • I just realized the pyflightdata API returns data in single quotes, so contrary to their documentation, it isn't JSON. json_normalize works partially, but data is still unreadable. – jpinxteren Nov 06 '20 at 17:57

0 Answers0