I have two tables in pandas:
df1: Containing User IDs and IP_Addresses for 150K users.
|---------------|---------------|
| User_ID | IP_Address |
|---------------|---------------|
| U1 | 732758368.8 |
| U2 | 350311387.9 |
| U3 | 2621473820 |
|---------------|---------------|
df2: Containing IP Address range and country it belongs to, 139K records
|---------------|-----------------|------------------|
| Country | Lower_Bound_IP | Upper_Bound_IP |
|---------------|-----------------|------------------|
| Australia | 1023787008 | 1023791103 |
| USA | 3638734848 | 3638738943 |
| Australia | 3224798976 | 3224799231 |
| Poland | 1539721728 | 1539721983 |
|---------------|-----------------|------------------|
My objective is to create a country column in df1 such that IP_Address of df1 lies between the range of Lower_Bound_IP and Upper_Bound_IP of that country in df2.
|---------------|---------------|---------------|
| User_ID | IP_Address | Country |
|---------------|---------------|---------------|
| U1 | 732758368.8 | Indonesia |
| U2 | 350311387.9 | Australia |
| U3 | 2621473820 | Albania |
|---------------|---------------|---------------|
My first approach was to do a cross join (cartesian product) of the two tables and then filter to the relevant records. However, a cross join using pandas.merge() is not feasible, since it will create 21 billion records. The code crashes every time. Could you please suggest an alternative solution which is feasible?