0
unitowns = get_list_of_university_towns()
bottom = get_recession_bottom()
start = get_recession_start()
hdata = convert_housing_data_to_quarters()
bstart = hdata.columns[hdata.columns.get_loc(start) - 1]
hdata= hdata[[bstart,bottom]]
hdata['Ratio']=hdata[bstart]/ hdata[bottom]
hdata=hdata.reset_index()

combined = pd.merge (hdata, unitowns, how='inner', on=['State','RegionName'])
 

 

The following cleaning needs to be done:

  1. For "State", removing characters from "[" to the end.
  2. For "RegionName", when applicable, removing every character from " (" to the end.
  3. Depending on how you read the data, you may need to remove newline character '\n'. '''

code for getting unitown values

   df = pd.read_csv('university_towns.txt', delimiter = '\t', header= 
   None).rename(columns={0:'Data'}) 
   boolian_df = df['Data'].str.contains('[edit]', regex= False) 
   state_university= []
   for index, value in boolian_df.items():
      if value:
         state = df.loc[index].values[0]
      else:
         region = df.loc[index].values[0]
         state_university.append([state,region])

   final_dataframe = pd.DataFrame(state_university, columns=['State', 
   'RegionName'])
   final_dataframe['State'] = final_dataframe['State'].str.replace('\ 
   [edit.*','') 
   final_dataframe['RegionName'] = 
   final_dataframe['RegionName'].str.replace('\ 
   (.*', '')
   final_dataframe['RegionName'] = 
   final_dataframe['RegionName'].str.replace('University.*,', '')
   return final_dataframe  

output unitowns.head()

      State      RegionName  Type
      Alabama        Auburn   Uni
      Alabama      Florence   Uni
      Alabama  Jacksonville   Uni
      Alabama    Livingston   Uni
      Alabama    Montevallo   Uni

output hdata.head()

          State      RegionName         2008q1         2009q2     Ratio
          New York      New York     508500.000000  465833.333333  1.091592
          California   Los Angeles   535300.000000  413900.000000  1.293308
          Illinois       Chicago     243733.333333  219700.000000  1.109392
          Pennsylvania  Philadelphia 119566.666667  116166.666667  1.029268
          Arizona       Phoenix  218633.333333  168233.333333  1.299584

Both dataframes have same column names.

It gives Empty DataFrame Columns: [State, RegionName, 2008q1, 2009q2, Ratio] Index: []

Mr_question
  • 71
  • 2
  • 10

2 Answers2

0

If both dataframes have same columns, then I guess you want to join it.

hdata.append(unitowns, inplace=True)
Ajay A
  • 1,030
  • 1
  • 7
  • 19
  • i only want to include the common values from the two data frame, two columns are same in both dataframe and rest are not. – Mr_question Jul 04 '20 at 11:44
  • `hdata = hdata[list(unitowns.columns)]` will create dataframes of same columns, now you can append – Ajay A Jul 04 '20 at 11:51
  • Append add the new value at the bottom. I am not looking for that. I want to merge the dataframe not append – Mr_question Jul 04 '20 at 11:54
0

Sample data

unitowns=pd.DataFrame({'State':['New York','California'],'RegionName':['New York','Los Angeles'],'Type':['Uni','Uni']})
print(unitowns)

       State   RegionName  Type
0    New York     New York  Uni
1  California  Los Angeles  Uni

hdata=pd.DataFrame({'State':['New York','California'],'RegionName':['New York','Los Angeles'],'2008q1':['500.000','400.000']})
print(hdata)

     State   RegionName   2008q1
0    New York     New York  500.000
1  California  Los Angeles  400.000

merge

 pd.merge(hdata,unitowns, how='left', on=['State','RegionName'])
     State      RegionName   2008q1 Type
0    New York     New York  500.000  Uni
1  California  Los Angeles  400.000  Uni
wwnde
  • 26,119
  • 6
  • 18
  • 32
  • I had read from a text file for unitown, and then after cleaning the data. its like in the form of unitowns.head(). Now the problem i figure out is that i cant get the values of RegionName from unitowns. – Mr_question Jul 04 '20 at 13:09
  • that why merge is not working for me. how i can do that? – Mr_question Jul 04 '20 at 13:10
  • are the region names in either datarames different? If so, pd.merge(hdata,unitowns, how='left', on=['State'], suffixes=('_x', '_y')) – wwnde Jul 04 '20 at 13:45
  • i am not to access the values of RegionName from unitowns. – Mr_question Jul 04 '20 at 13:58
  • i am accessing it by unitowns.loc[unitowns['RegionName'] == 'Los Angeles ']. output is empty. But when i print the whole unitowns, Los Angeles is present there. Cant figure out where i am lacking – Mr_question Jul 04 '20 at 13:59