1

In R I need to perform a similar function to index-match in Excel which returns the value just greater than the look up value.

Data Set A

Country     GNI2009           
Ukraine     6604
Egypt       5937
Morocco     5307
Philippines 4707
Indonesia   4148
India       3677
Viet Nam    3180
Pakistan    2760
Nigeria     2699

Data Set B

GNI2004 s1  s2  s3  s4
6649    295 33  59  3
6021    260 30  50  3
5418    226 27  42  2
4846    193 23  35  2
4311    162 20  29  2
3813    134 16  23  1
3356    109 13  19  1
2976    89  10  15  1
2578    68  7   11  0
2248    51  5   8   0
2199    48  5   8   0

At the 2009 level GNI for each country (data set A) I would like to find out which GNI2004 is just greater than or equal to GNI2009 and then return the corresponding sales values (s1,s2...) at that row (data set B). I would like to repeat this for each and every Country-gni row for 2009 in table A.

For example: Nigeria with a GNI2009 of 2698 in data set A would return:

GNI2004 s1  s2  s3  s4
2976    89  10  15  1

In Excel I guess this would be something like Index and Match where the match condition would be match(look up value, look uparray,-1)

pnuts
  • 58,317
  • 11
  • 87
  • 139
user36176
  • 339
  • 1
  • 2
  • 11

1 Answers1

2

You could try data.tables rolling join which designed to achieve just that

library(data.table) # V1.9.6+
indx <- setDT(DataB)[setDT(DataA), roll = -Inf, on = c(GNI2004 = "GNI2009"), which = TRUE]
DataA[, names(DataB) := DataB[indx]]
DataA  
#        Country GNI2009 GNI2004  s1 s2 s3 s4
# 1:     Ukraine    6604    6649 295 33 59  3
# 2:       Egypt    5937    6021 260 30 50  3
# 3:     Morocco    5307    5418 226 27 42  2
# 4: Philippines    4707    4846 193 23 35  2
# 5:   Indonesia    4148    4311 162 20 29  2
# 6:       India    3677    3813 134 16 23  1
# 7:    Viet Nam    3180    3356 109 13 19  1
# 8:    Pakistan    2760    2976  89 10 15  1
# 9:     Nigeria    2699    2976  89 10 15  1

The idea here is per each row in GNI2009 find the closest equal/bigger value in GNI2004, get the row index and subset. Then we update DataA with the result.


See here for more information.

David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • thanks. but befor i get to that, the moment i try to run library(data.table) it says there is no such package even though i have installed it using install.packages. – user36176 Nov 26 '15 at 20:44
  • Says some dependencies are unavailable , initially it tried a number of mirrors though – user36176 Nov 26 '15 at 20:51
  • 1
    Try `install.packages("data.table", dependencies = TRUE)` or `install.packages("data.table", dependencies = "Depends")`. See `?install.packages` for more details. – David Arenburg Nov 26 '15 at 21:02
  • Works! THanks :) I had tried install.packages("data.table", dep= T, "install path") wonder what was wrong – user36176 Nov 26 '15 at 21:07
  • Hey thanks for the help works like a charm :) just to understand if I were to pick up the value lesser than the look up value instead of the value greater how wud the code change ? – user36176 Nov 27 '15 at 08:25
  • 1
    Use `Inf` instead of `-Inf` – David Arenburg Nov 27 '15 at 12:14