1

I have the following formula:

allhere = allhere.sort_values('RegionName')
unitowns = unitowns.sort_values('RegionName') 
universitytowns = allhere[allhere['RegionName']==unitowns['RegionName']]
nonuniversitytowns = allhere[allhere['RegionName']!=unitowns['RegionName']]

However, I keep receiving the error: Can only compare identically-labeled Series objects

I am trying to select rows from the dataframe 'allhere', where the values for 'RegionName' are equal to those in the dataframe 'unitowns'.

These should go into a new dataframe called 'universitytowns'.

And likewise, the remaining values that are not identical into a dataframe called 'Nonuniversitytowns'.

I have tried sorting the order of the values in this series but this does not seem to work?

Could anyone give me a helping hand?

The data for allhere looks like this:

 State RegionName         2000q1         2000q2         2000q3  \
9171   Mississippi  Abbeville   82866.666667   84900.000000   85566.666667   
2062      Maryland   Aberdeen  115666.666667  117833.333333  119300.000000   
4445   Mississippi   Aberdeen   56733.333333   53266.666667   50233.333333   
6906         Texas  Abernathy            NaN            NaN            NaN   
11173        Texas   Abilene             NaN            NaN            NaN   

              2000q4         2001q1         2001q2         2001q3  \
9171    83266.666667   79833.333333   78933.333333   81933.333333   
2062   120666.666667  123766.666667  126100.000000  127266.666667   
4445    48533.333333   49633.333333   50933.333333   51400.000000   
6906             NaN            NaN            NaN            NaN   
11173            NaN            NaN            NaN            NaN   

              2001q4      ...              2014q3         2014q4  \
9171    85233.333333      ...       121966.666667  118700.000000   
2062   127133.333333      ...       200000.000000  199766.666667   
4445    50700.000000      ...        72133.333333   70000.000000   
6906             NaN      ...        65233.333333   64466.666667   
11173            NaN      ...                 NaN            NaN   

              2015q1         2015q2         2015q3         2015q4  \
9171   118466.666667  121200.000000  115566.666667  104633.333333   
2062   195266.666667  195266.666667  194433.333333  194800.000000   
4445    74633.333333   75200.000000   75900.000000   77000.000000   
6906    63933.333333   65300.000000   66166.666667   66800.000000   
11173            NaN            NaN            NaN            NaN   

              2016q1         2016q2    2016q3    difference  
9171   103966.666667  105233.333333  106500.0   2800.000000  
2062   196500.000000  198466.666667  198300.0  11066.666667  
4445    76200.000000   80733.333333   81150.0 -16100.000000  
6906    67433.333333   67066.666667   65750.0           NaN  
11173            NaN            NaN       NaN           NaN

The dataframe unitowns looks like this:

State RegionName
443     Texas   Abilene 
349  Oklahoma       Ada 
327      Ohio       Ada 
194  Michigan    Adrian 
45   Colorado   Alamosa 

Thus, both universitytowns and nonuniversitytowns should have the same columns as the dataframe 'allhere'.

Caledonian26
  • 727
  • 1
  • 10
  • 27

1 Answers1

1

You can try this:

allhere['unitown'] = allhere['RegionName'].isin(unitowns['RegionName'])

unitowns = allhere[allhere['unitown']==True]
nonunitowns = allhere[allhere['unitown']==False]
print(unitowns)
print(nonunitowns)


       State RegionName  2000q1  2000q2  2000q3  unitown
11173  Texas    Abilene     NaN     NaN     NaN     True

            State RegionName         2000q1         2000q2         2000q3  unitown
9171  Mississippi  Abbeville   82866.666667   84900.000000   85566.666667    False
2062     Maryland   Aberdeen  115666.666667  117833.333333  119300.000000    False
4445  Mississippi   Aberdeen   56733.333333   53266.666667   50233.333333    False
6906        Texas  Abernathy            NaN            NaN            NaN    False
NYC Coder
  • 7,424
  • 2
  • 11
  • 24
  • for some reason, when I do unitowns = allhere[allhere['unitown']==True], all the values come out as NaN – Caledonian26 May 21 '20 at 14:28
  • but I can't understand that! – Caledonian26 May 21 '20 at 14:28
  • All values or only some? The data has NaN's too – NYC Coder May 21 '20 at 14:30
  • Abilene has all NaN data, so it will be NaN – NYC Coder May 21 '20 at 14:30
  • All values, am I right in saying this line: allhere = pd.merge(houses,unitowns,how='outer',on=['State', 'RegionName']) will include all the rows from the columns ['State', 'RegionName'] in both dataframes – Caledonian26 May 21 '20 at 14:31
  • It means merge will happen based on those 2 indexes. If you use `how='outer'` it will do a full outer join. You might want to use `how='inner'` – NYC Coder May 21 '20 at 14:32
  • so you mean creating a dataframe that only has rows where the contents of 'State' and 'RegionName' are the same? allhere = pd.merge(houses,unitowns,how='inner',on=['State', 'RegionName']), and thus this will give me the data from houses with only the states and regions from 'unitowns'? – Caledonian26 May 21 '20 at 15:10
  • Correct. That's an intersection of the 2 sets. Only the values that exist in both will be shown. – NYC Coder May 21 '20 at 15:14
  • Ok great! And all I want to do after that is create a dataframe that has 'States' and 'RegionNames' that are not in the dataframe 'unitowns'. The rest of the dataframe should be the same as the 'houses' dataframe. I.e. a modified version of the dataframe 'houses'. How can I do that? – Caledonian26 May 21 '20 at 15:16
  • You can check this: https://stackoverflow.com/questions/32676027/how-to-do-df1-not-df2-dataframe-merge-in-pandas – NYC Coder May 21 '20 at 15:18
  • ok yes, that bit is working, but the first part: universitytowns = pd.merge(houses,unitowns,how='inner',on=['State', 'RegionName']) is giving me an empty datarame, which I can't understand? – Caledonian26 May 21 '20 at 15:25
  • You might want to create a different question for that and provide the sample data – NYC Coder May 21 '20 at 15:28
  • yes was just thinking so! Thanks for your help! I'm almost there ;) – Caledonian26 May 21 '20 at 15:29