1

I am trying to create a normalized pandas dataframe with addresses and the parsed addresses using 'usaddress' package in python. I would like to be able to store the results from the parsed output in a dataframe.

The output of usaddress.parse looks like below.

    usaddress.parse('Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637')



[('Robie', 'BuildingName'),
('House,', 'BuildingName'),
('5757', 'AddressNumber'),
('South', 'StreetNamePreDirectional'),
('Woodlawn', 'StreetName'),
('Avenue,', 'StreetNamePostType'),
('Chicago,', 'PlaceName'),
('IL', 'StateName'),
('60637', 'ZipCode')]

I have my address fields in the data dataframe. using above example i am trying to add buildingname, addressnumber etc as column names and the corresponding values as values but no luck.

add = []
for ind in data.index: 
     add1 = usaddress.Parse(data['address'][ind])
     add.append(add1)      
res = pd.DataFrame(add)

In using the above code the res dataframe is not the way i intended the output to be. The intended output is

The image shows the intened output from the dataframe

  • Please explain what the output needs to look like – Matina G Jun 26 '19 at 14:32
  • the output needs to be a dataframe with column names like BuildingName, AddressNumber, StreetNamePreDirectional, StreetName, PlaceName, ZipCode. If the next set of address elements results in different results like POBoxnumber etc. I want them additively included as well. The values in the columns should have the values above like Robie, House, 5757 etc. – keyser soze Jun 27 '19 at 09:23
  • These two answers should help: https://stackoverflow.com/questions/52264065/pandas-turn-list-of-lists-of-tuples-into-dataframe-awkward-column-headers and https://stackoverflow.com/questions/47659423/taking-dictionary-list-and-mapping-to-dataframe-with-x-number-of-matching-colum – Mishal Ahmed Nov 02 '21 at 11:28
  • Does this answer your question? [Pandas, turn list of lists of tuples into DataFrame awkward column headers.](https://stackoverflow.com/questions/52264065/pandas-turn-list-of-lists-of-tuples-into-dataframe-awkward-column-headers) – Mishal Ahmed Nov 02 '21 at 11:31

1 Answers1

0

If you have a list of addresses, you can process them all into a dataframe with column names as the address part. Sample code:

addresslist = ["Robie House, 5757 South Woodlawn Avenue, Chicago, IL 60637", "123 main st apt 2j miami fl"]
addressdictlist = []
for address in addresslist:
  addressdict = {}
  parsed = usaddress.parse(address)
  for value, key in parsed:
    value = value.strip(",")
    if addressdict.get(key,"") == "":
      addressdict[key] = value
    else:
      addressdict[key] = addressdict[key] + " " + value
  addressdictlist.append(addressdict)
    
addressdf = pd.DataFrame.from_dict(addressdictlist)
addressdf

And the output looks like this: screenshot of addressdf

I took the liberty of stripping the commas from the address part, but you could do that in pre-processing as well.

Vincent Rupp
  • 617
  • 5
  • 13