unitowns = get_list_of_university_towns()
bottom = get_recession_bottom()
start = get_recession_start()
hdata = convert_housing_data_to_quarters()
bstart = hdata.columns[hdata.columns.get_loc(start) - 1]
hdata= hdata[[bstart,bottom]]
hdata['Ratio']=hdata[bstart]/ hdata[bottom]
hdata=hdata.reset_index()
combined = pd.merge (hdata, unitowns, how='inner', on=['State','RegionName'])
The following cleaning needs to be done:
- For "State", removing characters from "[" to the end.
- For "RegionName", when applicable, removing every character from " (" to the end.
- Depending on how you read the data, you may need to remove newline character '\n'. '''
code for getting unitown values
df = pd.read_csv('university_towns.txt', delimiter = '\t', header=
None).rename(columns={0:'Data'})
boolian_df = df['Data'].str.contains('[edit]', regex= False)
state_university= []
for index, value in boolian_df.items():
if value:
state = df.loc[index].values[0]
else:
region = df.loc[index].values[0]
state_university.append([state,region])
final_dataframe = pd.DataFrame(state_university, columns=['State',
'RegionName'])
final_dataframe['State'] = final_dataframe['State'].str.replace('\
[edit.*','')
final_dataframe['RegionName'] =
final_dataframe['RegionName'].str.replace('\
(.*', '')
final_dataframe['RegionName'] =
final_dataframe['RegionName'].str.replace('University.*,', '')
return final_dataframe
output unitowns.head()
State RegionName Type
Alabama Auburn Uni
Alabama Florence Uni
Alabama Jacksonville Uni
Alabama Livingston Uni
Alabama Montevallo Uni
output hdata.head()
State RegionName 2008q1 2009q2 Ratio
New York New York 508500.000000 465833.333333 1.091592
California Los Angeles 535300.000000 413900.000000 1.293308
Illinois Chicago 243733.333333 219700.000000 1.109392
Pennsylvania Philadelphia 119566.666667 116166.666667 1.029268
Arizona Phoenix 218633.333333 168233.333333 1.299584
Both dataframes have same column names.
It gives Empty DataFrame Columns: [State, RegionName, 2008q1, 2009q2, Ratio] Index: []