13

I have a dataframe out of which I pick two subset dfs, df_a and df_b. For example in iris dataset:

df_a = iris[iris.Name == "Iris-setosa"]
df_b = iris[iris.Name == "Iris-virginica"]

What's the best way to get all elements of iris that are neither in df_a nor in df_b? I prefer not to refer to the original conditions that defined df_a and df_b. I just assume that df_a and df_b are subsets of iris, so I'd like to pull out elements from iris based on the indices of df_a and df_b. Basically, assume that:

df_a = get_a_subset(iris)
df_b = get_b_subset(iris)
# retrieve the subset of iris that 
# has all elements not in df_a or in df_b
# ...

EDIT: here is a solution that seems inefficient and inelegant and I'm sure pandas has a better way:

# get subset of iris that is not in a nor in b
df_rest = iris[map(lambda x: (x not in df_a.index) & (x not in df_b.index), iris.index)]

And a second one:

df_rest = iris.ix[iris.index - df_a.index - df_b.index]

how can this be done most efficiently/elegantly in pandas? thanks.

  • 1
    Check out this question, which may give you some good answers https://stackoverflow.com/questions/29134635/slice-pandas-dataframe-by-labels-that-are-not-in-a-list – Pablo Dec 05 '18 at 15:27
  • 1
    what about `pandas.Index.difference` ? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.difference.html – Marine Galantin Jun 28 '21 at 14:54

1 Answers1

20

This seems a bit faster than your second solution. There's a bit more overhead when indexing with .ix:

df[~df.index.isin(df_a.index+df_b.index)]
Zelazny7
  • 39,946
  • 18
  • 70
  • 84