1

The dataset that I am working with has 5 US territories included under the State column, and I want to remove any row/record that has these 5 territories as the state name. I'm able to remove all the records based on one value:

indexNames = df2[df2['state'] == 'District of Columbia'].index
df2.drop(indexNames , inplace=True)

but when I do the same thing with multiple:

indexNames = df2[(df2['state'] == 'Guam') & (df2['state'] == 'Virgin Islands')].index
df2.drop(indexNames , inplace=True)

no changes take place. Is there anyway I can list all 5 in the first statement and have it work?

Edit: I decided to rename all the nonstate territories to nonstate, and then dropped the rows with the value nonstate in the state column using the following code

df2['state'] = df2['state'].replace(['District of Columbia','Guam','Mariana 
Islands', 'Puerto Rico', 'Virgin Islands'],'nonstate')

indexNames = df2[df2['state'] == 'nonstate'].index
df2.drop(indexNames , inplace=True)
bochman81
  • 11
  • 2
  • 1
    Make a list of `states` and apply it like it's show in [this](https://stackoverflow.com/questions/41934584/how-to-drop-rows-by-list-in-pandas) post. – Danail Petrov Jan 24 '21 at 16:24
  • Does this answer your question? [How to drop rows by list in pandas](https://stackoverflow.com/questions/41934584/how-to-drop-rows-by-list-in-pandas) – Danail Petrov Jan 24 '21 at 16:47

2 Answers2

1

You could "drop" those rows by filtering them out:

df = df[(df["state"]!="Guam") & (df["state"]!="Virgin Islands")]

Personally, I would probably use the isin method along with the ~ (NOT) operator:

exclude = ("Guam", "Virgin Islands", "District of Columbia")
df = df[~(df["state"].isin(exclude))]

This lets you use a sequence so you don't have to hard code in your excluded items.

ohtotasche
  • 478
  • 3
  • 7
  • You should change ```("Guam", "Virgin Islands", "District of Columbia")``` , to ```["Guam", "Virgin Islands", "District of Columbia"]``` as your aim is to create a ```list``` – sophocles Jan 24 '21 at 16:55
  • @sophods I don't know that the aim _is_ create a `list` specifically. Unless we think the list of US territories is something that will need to be changed during runtime, an immutable sequence like a tuple is just fine. I often purposely use a tuple to catch myself erroneously trying to change something that should be a "constant". – ohtotasche Jan 24 '21 at 17:28
0

Have you tried df2.loc(indexNames) to see if it pulls out the rows you want?

For example make filter (this can be your state names)

filt = (df['lname'] == 'Graham') & (df['fname'] == 'Bob')

Then apply the filter to the data frame to extract those rows that fit criteria

df.loc[filt]

Hope this might help

sophocles
  • 13,593
  • 3
  • 14
  • 33