0

I would like to use library(data.table) in r to efficiently replace values of datatable1 against another datatable2. Both datatables have 2mil+ records.

For example (simplified form): datatable 1 looks like-

Source  Target  Col1    Col2
277 266 a   b
260 277 b   s
276 277 d   c
265 258 a   v
259 258 d   s
262 262 d   x
270 274 e   q
273 277 f   w

datatable 2:

ToBeMapStr  LookUpValue
256 1
257 2
258 3
259 4
260 5
261 6
262 7
263 8
264 9
265 10
266 11
267 12
268 13
269 14
270 15
271 16
272 17
273 18
274 19
275 20
276 21
277 22

The desired output in data.table1 should be:

Source  Target  Col1    Col2
22  11  a   b
5   22  b   s
21  22  d   c
10  3   a   v
4   3   d   s
7   7   d   x
15  19  e   q
18  22  f   w
Serene
  • 1
  • 1
  • Hi and welcome to SO. Did you try anything and have problems? – talat Feb 15 '17 at 08:45
  • What you would need to do is an inner join on source and then an inner join on the target.https://rstudio-pubs-static.s3.amazonaws.com/52230_5ae0d25125b544caab32f75f0360e775.html – ArunK Feb 15 '17 at 08:49
  • Another option is to convert to long format, replace the values in a join, then (optionally) reshape back to wide format – talat Feb 15 '17 at 09:44

1 Answers1

0

You can just use R Base for this using the function match.

generate example data

str1 <- 'Source  Target  Col1    Col2
277 266 a   b
260 277 b   s
276 277 d   c
265 258 a   v
259 258 d   s
262 262 d   x
270 274 e   q
273 277 f   w'

str2 <- 'ToBeMapStr  LookUpValue
256 1
257 2
258 3
259 4
260 5
261 6
262 7
263 8
264 9
265 10
266 11
267 12
268 13
269 14
270 15
271 16
272 17
273 18
274 19
275 20
276 21
277 22
'

file1 <- textConnection(str1)
file2 <- textConnection(str2)
dt1 <- read.table(file1, header = T)
dt2 <- read.table(file2, header = T)

replace values

dt1$Source <- dt2[match(dt1$Source,dt2$ToBeMapStr),'LookUpValue']
dt1$Target <- dt2[match(dt1$Target,dt2$ToBeMapStr),'LookUpValue']

dt1

result

  Source Target Col1 Col2
1     22     11    a    b
2      5     22    b    s
3     21     22    d    c
4     10      3    a    v
5      4      3    d    s
6      7      7    d    x
7     15     19    e    q
8     18     22    f    w
Wietze314
  • 5,942
  • 2
  • 21
  • 40