6

I am creating a tool which depends on addresses. For the purposes of testing, I'd like to create a large number of valid US addresses. I have the GeoNames postal code data and I would like to generate some number of real addresses for each of the ~41,000 zip codes in the United States.

I've found sites like FakeAddressGenerator and FakeName which claim to generate random, valid US addresses. How do these sites work? How can I do the same thing without relying on scraping these websites?

Ideally, I'd like to be able to do this in Python; utilizing a web service is fine (it doesn't seem that either FakeAddressGenerator or FakeName provide such a web service).

Thanks!

Joseph
  • 12,678
  • 19
  • 76
  • 115
  • Welcome to StackOverflow. Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [On topic](http://stackoverflow.com/help/on-topic) and [how to ask](http://stackoverflow.com/help/how-to-ask) apply here. StackOverflow is not a design, coding, research, or tutorial service. – Prune Apr 03 '18 at 20:57
  • 3
    Seems like "a practical, answerable problem that is unique to software development" to me. – Joseph Apr 03 '18 at 21:01
  • You should use faker library – Phd. Burak Öztürk Apr 26 '18 at 23:43
  • @BurakÖztürk the problem about faker library is that it does not guarantee that the addresses will be real. – Lynx-Lab Apr 27 '18 at 10:47
  • 1
    What is the need for a valid address in testing? Can't you mock data or responses? – Tarun Lalwani Apr 27 '18 at 11:08
  • I am indeed looking to create mock data, but I wish that the addresses in my mock data are valid because I am hoping to allow myself and users of this mock data (e.g, those who are learning web development) to be able to integrate/mashup the mock data with Google Maps and have it actually show up properly. – Joseph Apr 27 '18 at 19:56
  • There are online sites and services that have addressed (pun intended) this problem such as http://www.fakenamegenerator.com/ – Uncle Long Hair May 02 '18 at 17:03

2 Answers2

14

Googling your issue I found 2 links of interest:

  1. https://github.com/EthanRBrown/rrad that provides approximately 3200 real anonymised addresses.
  2. https://openaddresses.io that also has a link to their open source github with the complete data set.

I don't recommend scraping the fake address generators as they do not guarantee existence. I would not go sampling in google maps either as you will surely get blacklisted.

Extracting data from downloaded zip file in 2 is easy: they are zip files containing csv files with full address, zip, lat, lon, etc...

The two above data sets "guarantee" the existence of the address. I don't know how hard your other conditions are, namely having at least one valid address for each of the 41k zip codes. If this is a hard constraint, I doubt you will get such data set open source.


EDIT:

If you have a list of all postcodes in the US, a fully automatable solution is by using a service called nominatim of openstreetmap(subject to their TOCs!)

1) get the lat, lon (centre point or default address) of each post code:

https://nominatim.openstreetmap.org/search/?format=xml&addressdetails=1&limit=1&country_codes=us&postalcode=35051

2) get the related address of this lat, lon:

https://nominatim.openstreetmap.org/reverse?format=xml&lat=33.178764&lon=-86.619038&zoom=18&addressdetails=1

trying this example for Columbiana in Alabama (postcode 35051) yields 397 West College Street.

Nominatim documentation is at: https://wiki.openstreetmap.org/wiki/Nominatim

Lynx-Lab
  • 795
  • 8
  • 17
  • I won't have a chance to try this out for a couple days, but it looks like you've found the perfect solution to my problem with nominatim! I will follow up and award the bounty in a few days. – Joseph Apr 27 '18 at 19:59
1

You can install random-address:

pip install random-address

And then use random_address.real_random_address_by_postal_code:

>>> import random_address
>>> random_address.real_random_address_by_postal_code('32409')
{'address1': '711 Tashanna Lane', 'address2': '', 'city': 'Southport', 'state': 'FL', 'postalCode': '32409', 'coordinates': {'lat': 30.41437699999999, 'lng': -85.676568}}
neosergio
  • 452
  • 5
  • 15
  • Heads up - this does not work for most states/zip codes. The PyPi page lists the set of states and locations this works for - https://pypi.org/project/random-address/ – SulfoCyaNate Nov 12 '21 at 23:21