I have two existing DataFrames which I have named death and air:
County,Death Rate
Autauga,859
Baldwin,976
County,AQI
Baldwin,51
Clay,45
These datasets were taken from different sources and are of different lengths, the same counties do not appear in each DataFrame.
When values for County match I need to create a third DataFrame with only columns AQI and Death Rate.
Here is what I started with (death is the larger df):
import pandas as pd
death = pd.read_csv('SimpleDeath1.csv')
air = pd.read_csv('simpleAir.csv')
data = pd.DataFrame(columns= ['AQI', 'Death Rate'], index=None)
for i in range (0, death.size):
if death['County'] == air['County']:
data.append({'AQI' : air['AQI'], 'Death Rate' : death['Death Rate']})
This outputs the following error:
ValueError: Can only compare identically-labeled Series objects
Which has been extensively asked about and discussed on SO. There are actually 382 questions returned when searching SO for this specific error, and while I haven't yet read them all, I have read enough to doubt the efficiency of my initial approach to this problem.
Some highlights from what I've read thus far.
Can anyone with fresh eyes help me to understand a better way of approaching this problem?
Some things I have tried:
Changing comparison:
if death['County'].equals(air['County')]:
Doesn't throw an error, but my new DataFrame is empty
Converting DataFrame values to strings:
if death['County'].str() == air['County'].str()): data.append({'AQI' : air['AQI'], 'Death Rate' : death['Death Rate']})
Throws:
TypeError: 'StringMethods' object is not callable
Any help using DataFrames or another strategy would be greatly appreciated!