Remove json lines based on specific string

Question

I have a json file that reads something like this:

[{"headline":"Ntugamo court issues criminal summons against Rukutana",
"url_src":"\/news\/headlines\/67240-ntugamo-court-issues-criminal-summons-against-rukutana"},
{"headline":"Corruption: Equal Opportunities Commission boss granted bail",
"url_src":"\/news\/headlines\/67239-corruption-equal-opportunities-commission-boss-granted-bail"},
{"headline":"Bobi Wine to launch corruption manifesto in Mbarara rejects EC security team",
"url_src":"https:\/\/www.monitor.co.ug"}]

I am trying to find and remove all the sections within { and } that contain the word Corruption, including the curly brackets themselves.

For example, in this case the .py script would remove

{"headline":"Corruption: Equal Opportunities Commission boss granted bail",

"url_src":"/news/headlines/67239-corruption-equal-opportunities-commission-boss-granted-bail"}

and also remove

{"headline":"Bobi Wine to launch corruption manifesto in Mbarara rejects EC security team","url_src":"https:\/\/www.monitor.co.ug"}

Is this possible with Python 2.7?

Red · Accepted Answer · 2020-12-06T17:13:55.910

You can use a list comprehension to iterate through each dict in the list.

Convert each dictionary into a string in each iteration, and use if "corruption" not in str(d).lower() to check to see if the string "corruption" is in the lowercased string. If not, then keep it:

import json

with open("j.json", "rb") as f:
    lst = json.load(f)

lst = [d for d in lst if "corruption" not in str(d).lower()]

print(lst)

Output:

[{'headline': 'Ntugamo court issues criminal summons against Rukutana',
  'url_src': '/news/headlines/67240-ntugamo-court-issues-criminal-summons-against-rukutana'}]

If you want to write the list back into the json file, use json.dump:

with open("j.json", "w", encoding="utf8") as f:
    json.dump(lst, f)

Remove json lines based on specific string

1 Answers1