2

I've got a problem i have been trying to work out I have googled a few things that are similar to what i want to do but can't work out exactly how to do it,

I have around 250M ip address's and i want to look that up against the maxmind geolite2 data so that i can tell what country each IP-address originates from,

I have imported all the data into my Redshift cluster with talend,

table a has ID and 'ipaddress' ie 10.0.0.5
table b (maxmind) has country name and IP range as expressed as 10.0.0.0/24

how could i use Redshift SQL to match these two considering the size of my source data source?

edit: heres the link to the geolite2 data https://dev.maxmind.com/geoip/geoip2/geolite2/

Dillon Wright
  • 151
  • 10
  • Don't know if redshift does, but Postgres supports the "contains" operator for the `inet` data type: http://www.postgresql.org/docs/current/static/functions-net.html that operator can also make use of an index on the columns –  May 11 '16 at 06:21
  • I tried this to no avail turns out redshift does not support either of these – Dillon Wright May 12 '16 at 23:04

1 Answers1

4

You could try using Amazon Redshift's ability to Import Custom Python Library Modules to load the netaddr library. Then, you could use the library within a User Defined Function written in Python.

See also: IP Range to CIDR conversion in Python?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470