1

I have a dataframe with ip address in one of the column and I want to add a new column called "country" from another dataframe based on the location of ip address that lies between lower and upper ip address.

Two dataframes

import numpy as np
import pandas as pd

df1 = pd.DataFrame({'ip': [0.1,2.5,3.5]})

df2 = pd.DataFrame({'low_ip': [3,2,7,10],
                   'high_ip': [5,3,9,11],
                   'country': ['A','B','A','C']})

print(df1)
    ip
0  0.1
1  2.5
2  3.5

print(df2)
   low_ip  high_ip country
0       3        5       A
1       2        3       B
2       7        9       A
3      10       11       C

Required

ip    country
0.1   NA
2.5   B   because: 2 <= 2.5 <= 3
3.5   A   because: 3 <= 3.5 <= 5
SdahlSean
  • 573
  • 3
  • 13
BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169
  • why are you using a pd Dataframe and not a dictionary? – SdahlSean Dec 28 '19 at 15:41
  • This is just an example, actually dataframes contains lots of other columns as well. – BhishanPoudel Dec 28 '19 at 15:42
  • 1
    I haven't tested performance or anything, but [this answer](https://stackoverflow.com/a/44601120/2715819) to a pretty much identical question gives another method that might be better for large data sets, although it does re-index `df1` – RishiG Dec 28 '19 at 16:19
  • @RishiG Thank for very useful link. Upvoted the given link. – BhishanPoudel Dec 28 '19 at 17:01

1 Answers1

1

Quick and dirty way:

countries = []
for i in range(len(df1)):
    ip = df1.loc[i, 'ip']
    country = df2.query("low_ip <= @ip <= high_ip")['country'].to_numpy()

    if len(country) > 0:
        countries.append(country[0])
    else:
        countries.append('NA')

df1['country'] = countries

print(df1)

    ip country
0  0.1      NA
1  2.5       B
2  3.5       A

BhishanPoudel
  • 15,974
  • 21
  • 108
  • 169