As part of an API reponse I'm receiving an address as a string that I need to split in order to put into our own database. The address can look like the following:
'teststreet 1, 1234 AZ City, Country'
'teststreet 9C, 1235 AZ City, Country'
'J. teststreet 1, 1243 AZ City, Country'
I'm having a hard time splitting this string into its separate parts.
The fact that the street-name itself *can be two parts and that a house number might include a letter as well that should also be separated if it exists are mostly bothering me.
I've tried a few approaches to solve this by:
adressdetails = row['Adresgegevens'].split(",")
adress2 = [x.replace(",", "") for x in adress2]
split_house_number = re.split(r'(\d)', adress2[2])
house_number = split_house_number[1]
house_number_extension = split_house_number[2]
- splitting on each comma (which doesn't work for separating the postal code from the city, and because you cant split on whitespace laster because of the variable amount of whitespaces ("J. Streetname" vs "streetname" )
- Splitting on whitespaces (as partly mentioned above)
- Regex splitting on encounter of a number to seperate the house number from the extension(I don't know that much about regex so I didn't feel like I had the right approach when attempting this)
I need the response address split into the following variables:
streetname
house_number
house_number_extension
zip_code
city
country
Example:
"teststreet 1C, 1234 AZ New York, Australia"
into ->
teststreet
1
C
1234 AZ
New York
Australia
Example 2:
"Jh. teststreet 1B, 9870 GH Amsterdam, Canada"
into ->
Jh. teststreet
1
B
9870 GH
Amsterdam
Canada
Example 3:
"teststreet 45, 9867 HJ Rotterdam, Germany"
into ->
teststreet
45
null
9867 HJ
Rotterdam
Germany