I have a list of addresses that I would like to put into a dataframe where each row is a new address and the columns are the units of the address (title, street, city).
However, the way the list is structured, some addresses are longer than others. For example:
address = ['123 Some Street, City','45 Another Place, PO Box 123, City']
I have a pandas dataframe with the following columns:
Index Court Address Zipcode Phone
0 Court 1 123 Court Dr, Springfield 12345 11111
1 Court 2 45 Court Pl, PO Box 45, Pawnee 54321 11111
2 Court 3 1725 Slough Ave, Scranton 18503 11111
3 Court 4 101 Court Ter, Unit 321, Eagleton 54322 11111
I would like to split the Address column into up to three columns depending on how many comma separators there are in the address, with NaN filling in where values will be missing.
For example, I hope the data will look like this:
Index Court Address Address2 City Zip Phone
0 Court 1 123 Court Dr NaN Springfield ... ...
1 Court 2 45 Court Pl PO Box 45 Pawnee ... ...
2 Court 3 1725 Slough Ave NaN Scranton ... ...
3 Court 4 101 Court Ter Unit 321 Eagleton ... ...
I have plowed through and tried a ton of different solutions on StackOverflow to no avail. The closest I have gotten is with this code:
df2 = pd.concat([df, df['Address'].str.split(', ', expand=True)], axis=1)
But that returns a dataframe that adds the following three columns to the end structured as such:
... 0 1 2
... 123 Court Dr Springfield None
... 45 Court Pl PO Box 45 Pawnee
This is close, but as you can see, for the shorter entries, the city lines up with the second address line for the longer entries.
Ideally, column 2 should populate every single row with a city, and column 1 should alternate between "None" and the second address line if applicable.
I hope this makes sense -- this is a tough one to put into words. Thanks!