0

I currently have a dataframe (df1) with one columns being a list of numbers. I want to look up those numbers in another dataframe (df2) that has two integer columns and see if the number from df1 falls in between the range of those two columns and get the data from the matching row. Below is my current approach, is there a better way of doing this?

for index, row in df1.iterrows():
    print df2[(df2['start'] <=  row['num']) & (df2['end'] >= row['num'])]['data'].iloc[0]

Here is what the head of df1 looks like:

           num
0  1216942535
1  1220432129
2  1501931542

head of df2:

                organization_name       start         end  
0                Service 2000 Srl  1478947232  1478947239  
1  Autolinee F Lli Bucci Urbino P  1478947240  1478947247  
2         S.M.S. DISTRIBUTION SRL  1478947248  1478947255  
3                    ALTOPACK SRL  1478947256  1478947263  
4                   COPYWORLD SRL  1478947264  1478947271  
user1775500
  • 2,263
  • 3
  • 18
  • 17
  • do you want to print the values? or store them in some way? also, it would be helpful if you could post a sample of `df1` and `df2` – DJK Aug 02 '17 at 21:20
  • I would like to either store the results back into the original dataframe df1, a new dataframe, or any sort of iterable object. I also added the heads of the dataframes. – user1775500 Aug 02 '17 at 22:28

1 Answers1

2

Basic use of .loc and boolean array logic :

# parentheses are mandatory here
result = df2.loc[(df1.num < df2.end) & (df1.num > df2.start), "organization_name"]

Test with Minimal Wirking Example :

df1 = pd.DataFrame(np.random.randint(0, 10, 5))

df2 = pd.DataFrame({
        "orgname": [str(i) for i in range(5)], 
        "start": np.random.randint(-5, 5, 5), 
        "end": np.random.randint(5, 15, 5)
    })[["orgname", "start", "end"]]
df2.loc[(df1[0] < df2.end) & (df1[0] > df2.start), "orgname"]
Jacquot
  • 1,750
  • 15
  • 25
  • What about if I want to get multiple columns from D2 and not just organization name – user1775500 Aug 02 '17 at 23:01
  • Also this gives me an error: ValueError: Series lengths must match to compare – user1775500 Aug 03 '17 at 00:17
  • 1
    yeah, I assumed you wanted a comparison row by row. You want, for each row from `df1`, _all_ the rows from df2 where the row from df1 is between the two row values from df2 ? – Jacquot Aug 03 '17 at 14:48