0

I am trying to traverse JSON data brought into a Dataframe.

Here is the code used to bring the data in:

df = json_normalize(data['PatentBulkData'])

Each series of the Dataframe is a list. Each list contains a list of dictionaries as represented below.

For example, here is the list of dictionaries returned when I enter df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]:

[{'eventCode': 'PG-ISSUE',
  'eventDate': '2020-04-23',
  'eventDescriptionText': 'PG-Pub Issue Notification'},
 {'eventCode': 'RQPR',
  'eventDate': '2020-01-02',
  'eventDescriptionText': 'Request for Foreign Priority (Priority Papers May Be Included)'},
 {'eventCode': 'M844',
  'eventDate': '2020-01-03',
  'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},
 {'eventCode': 'M844',
  'eventDate': '2020-01-02',
  'eventDescriptionText': 'Information Disclosure Statement (IDS) Filed'},
 {'eventCode': 'COMP',
  'eventDate': '2020-02-04',
  'eventDescriptionText': 'Application Is Now Complete'}]

Then, df['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j] would return the dictionary:

{'eventCode': 'PG-ISSUE',
 'eventDate': '2020-04-23',
 'eventDescriptionText': 'PG-Pub Issue Notification'}

I would like to iterate through each entry in the df['prosecutionHistoryDataBag.prosecutionHistoryData'] to identify rows containing a specific string in 'eventDescriptionText'.

In the above example df['prosecutionHistoryDataBag.prosecutionHistoryData'] is a Series, df['prosecutionHistoryDataBag.prosecutionHistoryData'][i] is a list, and ['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j] is a dictionary.

I would like to initially iterate through the list - and for each list iterate through the dictionary to see if 'eventDescriptionText' contains a specific string.

Thanks!

Jens
  • 8,423
  • 9
  • 58
  • 78
tbb
  • 15
  • 4
  • If I understand your question correctly then `df['prosecutionHistoryDataBag.prosecutionHistoryData']` references a list, each element of which is a list of dictionaries, as shown in your example output for element `[31]` of the dataframe’s entry? – Jens Jul 14 '20 at 03:20

2 Answers2

0

Try using the below code.

for lst in df['prosecutionHistoryDataBag.prosecutionHistoryData']:
    for I in lst:
        if I.get("eventDescriptionText").find(your_string) != -1:
            # do something
            pass
Astik Gabani
  • 599
  • 1
  • 4
  • 11
  • If I understand the question correctly, then your `i` is a list of dictionaries, because `df['prosecutionHistoryDataBag.prosecutionHistoryData']` seems to be a list of lists (of dictionaries). – Jens Jul 14 '20 at 03:15
  • You might be right. Let's confirm this from the question author himself. Let's wait for the answer of your question in the comment. – Astik Gabani Jul 14 '20 at 03:31
  • For clarification, df['prosecutionHistoryDataBag.prosecutionHistoryData'] is a list containing a dictionary. In the above example, each dictionary includes 3 key-values. – tbb Jul 14 '20 at 03:45
  • I get this - AttributeError: 'list' object has no attribute 'get' – tbb Jul 14 '20 at 03:50
  • Then df['prosecutionHistoryDataBag.prosecutionHistoryData'] is a list of list, as @jens said. Please verify this. – Astik Gabani Jul 14 '20 at 03:52
  • @tbb which tells me that `df['prosecutionHistoryDataBag.prosecutionHistoryData']` is a _list_ of lists. [`.get()`](https://docs.python.org/3/library/stdtypes.html#dict) is a dictionary function. – Jens Jul 14 '20 at 03:52
  • Truly - my apologizes. To clarify - In the above example `df['prosecutionHistoryDataBag.prosecutionHistoryData']` is a Series, `df['prosecutionHistoryDataBag.prosecutionHistoryData'][i]` is a list, and `['prosecutionHistoryDataBag.prosecutionHistoryData'][i][j]` is a dictionary. – tbb Jul 14 '20 at 03:59
  • @tbb, then see [my answer](https://stackoverflow.com/questions/62887429/how-to-traverse-lists-containing-a-dictionary#62887756). – Jens Jul 14 '20 at 04:03
  • @AstikGabani, I fixed your code, but please check the return value of `find()` against -1. As is, your code will fail if the text is found on position 0. – Jens Jul 14 '20 at 04:15
  • Hi @AstikGabani - your code does work. For some reason - at the end of the code, I get an error `TypeError: 'float' object is not iterable`. But the code does capture the requested string and does something. I will need to look into why the TypeError is occurring. – tbb Jul 14 '20 at 04:18
  • @tbb, see my comment to AstikGabani: the code has a bug and needs to be fixed. – Jens Jul 14 '20 at 04:19
  • @AstikGabani: please refer to the documentation of [`str.find()`](https://docs.python.org/3/library/stdtypes.html#str.find), your code is still broken even after your change. Either use `if … >= 0` or (as per docs) the `in` operator. – Jens Jul 14 '20 at 22:16
  • @Jens You are right. I have updated the code. tbb Please look at the updated code. – Astik Gabani Jul 15 '20 at 02:13
0

If I understand your question correctly then

df['prosecutionHistoryDataBag.prosecutionHistoryData']

is, in fact, a list whose elements are lists of dictionaries. See also my comment above. If that is the case, then the boring way is:

lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for dicts in lst:
    for d in dicts:
        if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':
            code = d['eventCode']
            date = d['eventDate']
            # Do something with code and date.

Now, you could flatten that list of lists and use a generator:

lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for d in (d for dicts in lst for d in dicts):
    if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR':
        code = d['eventCode']
        date = d['eventDate']
        # Do something with code and date.

Next, squeeze the test into the lists-flattening-generator as well to make the code a bit less readable:

lst = df['prosecutionHistoryDataBag.prosecutionHistoryData']
for code, date in ((d['eventCode'], d['eventDate']) for dicts in lst for d in dicts if d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR'):
    # Do something with code and date.

The filter() function doesn’t help much with readability here

for code, date in ((d['eventCode'], d['eventDate']) for d in filter(lambda d: d['eventDescriptionText'] == 'SOME TEXT YOU SEARCH FOR', (d for dicts in lst for d in dicts))):
    # Do something with code and date.        

but other itertools or more-itertools may be of use (e.g. the flatten() function).

Jens
  • 8,423
  • 9
  • 58
  • 78
  • Thanks, @Jens - I will give this a try as well. Thanks again for your assistance! – tbb Jul 14 '20 at 04:20