So I have a dataframe of 533,668 active business registries from the County Assessor's office in an excel spreadsheet. I want to get the Addresses (currently all in one column) broken up into the AddressNumber, StreetName, StreetType, UnitNumber, City, State, etc., and I have a library (pyusaddress), which can parse through the column. I used
`def clean_address(row):
try:
prep_address = usaddress.tag(row)
address = prep_address[0]
except usaddress.RepeatedLabelError as e :
print(e.parsed_string)
print(e.original_string)
address = 'Duplicate Address'
except TypeError:
address = "Invalid Address"
return address
address_list = active_businesses['STREET ADDRESS'].apply(clean_address)`
The problem with this is that I get a list of OrderedDicts, which I then need to parse through to get a dataframe. I tried a for loop, but it was incredibly slow, so I wanted to know if anyone had any better ideas?