0

Consider this matrix Y.

> (Y <- matrix(c(rep(1:4, each=2), rnorm(8)), 8))
     [,1]       [,2]
[1,]    1 -0.2452812
[2,]    1  1.3988440
[3,]    2  0.1558103
[4,]    2  0.2677039
[5,]    3  0.4716238
[6,]    3 -0.4442094
[7,]    4  1.9262647
[8,]    4 -0.9932708

I want to replace the values of column one of matrix Y with the according values of column two of matrix X.

> (X <- matrix(c(1:4, 4, letters[c(2, 4, 3, 1, 1)]), 5))
     [,1] [,2]
[1,] "1"  "b" 
[2,] "2"  "d" 
[3,] "3"  "c" 
[4,] "4"  "a" 
[5,] "4"  "a"

I have this code that technically works both with this example and with my real data.

> cbind(Y, sapply(Y[, 1], function(x) unique(X[X[, 1] == x, 2])))
     [,1] [,2]                 [,3]
[1,] "1"  "-0.245281227293266" "b" 
[2,] "1"  "1.39884404912828"   "b" 
[3,] "2"  "0.155810319624089"  "d" 
[4,] "2"  "0.267703920057734"  "d" 
[5,] "3"  "0.471623773960787"  "c" 
[6,] "3"  "-0.444209371984632" "c" 
[7,] "4"  "1.92626472214693"   "a" 
[8,] "4"  "-0.993270770582955" "a" 

However, since my real data is much larger, this seems to be a pretty slow solution; my real Y is a 260244 x 10 data frame and the process takes more than 12 seconds.

Is there - and generally - a faster base R solution to recode values of a data frame Y with the according values of a data frame X?

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • 2
    `merge(Y, X, 1)` – pogibas Jan 20 '19 at 14:38
  • @PoGibas Thanks, code works in the blink of an eye! I had already tried `merge`, but the reference to `by` was decisive. Actually my merging columns are different in original data, so I needed to specify `by.x` and `by.y` as explained in the other answer. Just because there are double rows in `X`, I had to do `unique()` of a subset of `X` to avoid duplicated rows and unneeded columns in the result. – jay.sf Jan 20 '19 at 15:11

0 Answers0