I want to drop one Dataframe from another (the first df is a subset of the second)

Question

My goal is to analyze changes in tuition costs for private schools in urban settings vs private schools in rural settings.

I have a dataframe with tuition costs of all private schools in the US through time (tuit_cost). The dataframe tuit_cost contains columns of historical tuition costs as well as two columns titled ['State','City/Town Name'].

I also have a separate dataframe of private schools that are classified as being in 'Urban' areas (urban_schools). This dataframe has only two columns -- ['State','City/Town Name'].

I merged the dataframes in order to create a dataframe with only the urban schools' historical tuition data.

urban_school_tuit = pd.merge(urban_schools, tuit_cost, how='left', left_on= ['State','City/Town Name'], right_on=['State','City/Town Name']).dropna()

Now I want to create a dataframe with only the rural schools' historical tuition data by dropping all of the rows in urban_school_tuit from tuit_cost.

What is the most efficient way to do so?

Thanks!

Have you tried dataframe.subtract ? Documentation here: http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.subtract.html — mba12, Dec 07 '16 at 15:38
Also interested in your study...I have a child in boarding school. Will you publish someplace eventually? — mba12, Dec 07 '16 at 15:38
Possible duplicate of http://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe. — 3novak, Dec 07 '16 at 15:49
@mba12 -- dataframe.subtract returns a view of ``tuit_cost`` that has '0.0' values across all the rows in ``urban_schools`` and NaN values across all the rows that i want to use in the ``rural_schools`` dataframe I want to create. Any suggestions on how to select these rows and populate them with the original data they contained in ``tuit_cost``?? (As of now I do not have plans to publish -- I will let you know if that changes.. thanks for your interest!) — p_sutherland, Dec 07 '16 at 17:09
@3novak the issue I was having was that the indices were not the same so I could not use ``is in`` (as recommended by most of the answers in the question you referenced) — p_sutherland, Dec 07 '16 at 20:28
Has this been published yet? Very interested in the findings. — Некто, Nov 03 '17 at 15:48

score 2 · Accepted Answer · answered Dec 07 '16 at 20:16

2

Was able to patch this together to create the desired dataframe (in Python 3)...

rural_schools = tuit_cost.drop(list(zip(urban_schools['State'],urban_schools['City/Town Name'])))

Open to any further guidance or suggestions.

answered Dec 07 '16 at 20:16

p_sutherland

471
1
11
21

I want to drop one Dataframe from another (the first df is a subset of the second)

1 Answers1