3

I have a list of dictionaries, the whole list represents different countries, and each dictionary includes basic data about each country as following: example a

df.countries[3]
"[{'iso_3166_1': 'DE', 'name': 'Germany'}, {'iso_3166_1': 'US', 'name': 'United States of America'}, {'iso_3166_1': 'IN', 'name': 'India'}]"

Of course, there are other cells where countries list has only one dictionary like this: example b

df.countries[0]
"[{'iso_3166_1': 'US', 'name': 'United States of America'}]"

Or empty list like this: example c

df.countries[505]
'[]'

What I want to do is:

  • Delete rows where country name is United States of America BUT ONLY WHEN it's the only country in the list, not when there are other countries like example a.

I tried to brainstorm and came up with something like this:

countryToRemove = "United States of America"
for index, row in df.iterrows():
    if countryToRemove in row['countries']:
        # row to be removed

But it deletes any row with the US in it even if other countries were there.

Edit: My dataframe is as following:

countries
0  [{'iso_3166_1': 'DE', 'name': 'Germany'}, {'is...
1  [{'iso_3166_1': 'US', 'name': 'United States o...
2                                                 []
Fatimah E.
  • 101
  • 1
  • 5
  • 2
    Please repeat your tutorial materials on sequences (lists, tuples, etc.) and learn the basic methods on each data type. Here, you need to check also that the list length is 1. – Prune Apr 29 '21 at 22:02
  • Can you please edit your question so we can reproduce your problem? [This might be a useful resource for that purpose](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples/20159305#20159305). – Jasmijn Apr 29 '21 at 22:07

1 Answers1

3

If you have dataframe like this:

                                           countries
0  [{'iso_3166_1': 'DE', 'name': 'Germany'}, {'is...
1  [{'iso_3166_1': 'US', 'name': 'United States o...
2                                                 []

Then you can use boolean indexing to filter out your dataframe:

mask = df.countries.apply(
    lambda x: len(s := set(d["name"] for d in x)) == 1
    and s.pop() == "United States of America"
)
print(df[~mask])

Prints:

                                           countries
0  [{'iso_3166_1': 'DE', 'name': 'Germany'}, {'is...
2                                                 []

EDIT: Version without := operator:

def fn(x):
    s = set(d["name"] for d in x)
    return len(s) == 1 and s.pop() == "United States of America"


mask = df.countries.apply(fn)
print(df[~mask])
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Yes, the dataframe is as you typed. I tried your solution, but it gives the following error: File "", line 2 lambda x: len(s: set(d["name"] for d in x)) == 1 ^ SyntaxError: invalid syntax – Fatimah E. Apr 29 '21 at 22:27
  • @FatimahE. You're using old version of Python. See my edit. – Andrej Kesely Apr 29 '21 at 22:30