2

My goal is to analyze changes in tuition costs for private schools in urban settings vs private schools in rural settings.

I have a dataframe with tuition costs of all private schools in the US through time (tuit_cost). The dataframe tuit_cost contains columns of historical tuition costs as well as two columns titled ['State','City/Town Name'].

I also have a separate dataframe of private schools that are classified as being in 'Urban' areas (urban_schools). This dataframe has only two columns -- ['State','City/Town Name'].

I merged the dataframes in order to create a dataframe with only the urban schools' historical tuition data.

urban_school_tuit = pd.merge(urban_schools, tuit_cost, how='left', left_on= ['State','City/Town Name'], right_on=['State','City/Town Name']).dropna()

Now I want to create a dataframe with only the rural schools' historical tuition data by dropping all of the rows in urban_school_tuit from tuit_cost.

What is the most efficient way to do so?

Thanks!

p_sutherland
  • 471
  • 1
  • 11
  • 21
  • 1
    Have you tried dataframe.subtract ? Documentation here: http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.subtract.html – mba12 Dec 07 '16 at 15:38
  • Also interested in your study...I have a child in boarding school. Will you publish someplace eventually? – mba12 Dec 07 '16 at 15:38
  • Possible duplicate of http://stackoverflow.com/questions/28901683/pandas-get-rows-which-are-not-in-other-dataframe. – 3novak Dec 07 '16 at 15:49
  • @mba12 -- dataframe.subtract returns a view of ``tuit_cost`` that has '0.0' values across all the rows in ``urban_schools`` and NaN values across all the rows that i want to use in the ``rural_schools`` dataframe I want to create. Any suggestions on how to select these rows and populate them with the original data they contained in ``tuit_cost``?? (As of now I do not have plans to publish -- I will let you know if that changes.. thanks for your interest!) – p_sutherland Dec 07 '16 at 17:09
  • @3novak the issue I was having was that the indices were not the same so I could not use ``is in`` (as recommended by most of the answers in the question you referenced) – p_sutherland Dec 07 '16 at 20:28
  • Any update on this research proposal? :^) – boson Jan 24 '17 at 20:33
  • Has this been published yet? Very interested in the findings. – Некто Nov 03 '17 at 15:48

1 Answers1

2

Was able to patch this together to create the desired dataframe (in Python 3)...

rural_schools = tuit_cost.drop(list(zip(urban_schools['State'],urban_schools['City/Town Name'])))

Open to any further guidance or suggestions.

p_sutherland
  • 471
  • 1
  • 11
  • 21