1

I have a dataframe df1 of coordinate values like this:

    lat         lon         E               N
0   48.010258   -6.156909   90089.518711    -201738.245555
1   48.021648   -6.105887   93961.324059    -200676.766517
2   48.033028   -6.054801   97836.321204    -199614.270439
... ...         ...         ...             ...

and another dataframe df2 that associates a climatic value to each (lat, lon) pair:

    lat         lon        val
0   48.010258   -6.156909  17.11
1   48.021648   -6.105887  22.23
2   48.033028   -6.054801  39.86
... ...         ...        ...

I want to create a new column, df1['corr_pos'], where each row is given the index of df2 corresponding to the (lat, lon) pair in df1. It is like using VLOOKUP in Excel, but using two values to identify the correct index instead of using only one. The two values are the coordinate pair.

The output would be:

    lat         lon         E               N               corr_pos
0   48.010258   -6.156909   90089.518711    -201738.245555  0
1   48.021648   -6.105887   93961.324059    -200676.766517  3
2   48.033028   -6.054801   97836.321204    -199614.270439  8
... ...         ...         ...             ...             ...

The dataframes df1 and df2 do not have the same order. How could I implement this in pandas?

FaCoffee
  • 7,609
  • 28
  • 99
  • 174

1 Answers1

1

I think you need merge with reset_index to create a new column from index:

print (df2)
          lat       lon    val
7   48.010258 -6.156909  17.11
10  48.021648 -6.105887  22.23
12  48.033028 -6.054801  39.86
df = pd.merge(df1, 
              df2.reset_index().drop('val', axis=1).rename(columns={'index':'corr_pos'}), 
              on=['lat','lon'], 
              how='left')
print (df)
         lat       lon             E              N  corr_pos
0  48.010258 -6.156909  90089.518711 -201738.245555         7
1  48.021648 -6.105887  93961.324059 -200676.766517        10
2  48.033028 -6.054801  97836.321204 -199614.270439        12

If df2 has many columns, it is better to use subset as merge will delete them:

df = pd.merge(df1, 
              df2.reset_index()[['lat','lon', 'index']].rename(columns={'index':'corr_pos'}),
              on=['lat','lon'], 
              how='left')
print (df)
         lat       lon             E              N  corr_pos
0  48.010258 -6.156909  90089.518711 -201738.245555         7
1  48.021648 -6.105887  93961.324059 -200676.766517        10
2  48.033028 -6.054801  97836.321204 -199614.270439        12
FaCoffee
  • 7,609
  • 28
  • 99
  • 174
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • What if I want to add the `corr_pos` column to `df1`, instead of creating a new dataframe? Of course I don't want to lose the correspondences. – FaCoffee Feb 01 '17 at 13:41
  • Hmmm, there is problem you need join by `2 columns`, so cannot use [map solution](http://stackoverflow.com/questions/38291908/excel-vlookup-equivalent-in-pandas/38291973#38291973). I have no idea it is possible without `merge`, maybe `join` - all methods return new dataframe.. – jezrael Feb 01 '17 at 13:45
  • But I can create a new `df` and then use `map` to add the new column to `df1`. – FaCoffee Feb 01 '17 at 13:46
  • 1
    Yes, but map need one column which is mapped to another values. If check [this solution](http://stackoverflow.com/a/40239323/2901002) there is only one column in `on` parameter, so `map` can be used. But if have more as 1 joined columns `on=['val1','val2',...]` then `map` solution is impossible. – jezrael Feb 01 '17 at 13:49
  • On second thoughts this doesn't work. It creates a `corr_pos_x` and `corr_pos_y` that I do not want. – FaCoffee Feb 01 '17 at 14:21
  • Ok, so solution does not work? You get `corr_pos_x` and `corr_pos_y` if in first and second dataframe is same column `corr_pos`. In sample is not this column. What is your code? How is modified my solution? – jezrael Feb 01 '17 at 16:27