1

I have a dataset df1.

enter image description here

I'd like to replace each occurence of "One + one," "Two ; one," etc. with some numbers as shown in the lookup table df2.

enter image description here

Desired output:

enter image description here

Any idea how to do this? This is a follow-up to my original question How to replace string values in a column based on a lookup table

I tried the following but it doesn't work. Thanks in advance!

 df1$New <- gsubfn::gsubfn("[A-z]+,;", as.list(setNames(df2$Node,df2$Label)), df1$Node)

Data:

df1 <- data.frame(ID = 1:5, Node = c("One + one > Two ; one > Three ; two", "One + two > Two ; two > Three ; one", "One + one > Two ; two > Three ; one", "One + two > Two ; one > Three ; two", "One + one > Two ; two > Three ; two"), stringsAsFactors = FALSE)

df2 <- data.frame(Label =  c("One + one", "One + two", "Two ; one", "Two ; two", "Three ; one", "Three ; two"), Node = c("1.1", "1.2", "2.1", "2.2", "3.1", "3.2"), stringsAsFactors = FALSE)

UPDATED DATA:

df1 <- data.frame(ID = 1:5, Node = c("AO Ales + Bitter > Brown and Stout > Premium && Super Premium", "Lager > Dry, Premium Strength, Style, Traditional > Mainstream & Value", "AO Ales + Bitter > Dry, Premium Strength, Style, Traditional > Mainstream & Value", "Lager > Brown and Stout > Dry, Premium Strength, Style, Traditional", "AO Ales + Bitter > Dry, Premium Strength, Style, Traditional > Premium && Super Premium"), stringsAsFactors = FALSE)

df2 <- data.frame(Label = c("AO Ales + Bitter", + "Lager", + "Brown and Stout", + "Dry, Premium Strength, Style, Traditional", + "Mainstream & Value", + "Premium && Super Premium" + ), Node = c("1.1", "1.2", "2.1", "2.2", "3.1", "3.2"), stringsAsFactors = FALSE)

Ketty
  • 811
  • 10
  • 21
  • Could you share code to reproduce your data, instead of (or in addition to, if you like) screenshots? That makes it much easier in general to help you. – Benjamin May 23 '19 at 20:54
  • @Benjamin. Just added. Thanks! – Ketty May 23 '19 at 20:55
  • When you say it doesn't work, what happened? Did you get an error message, and if so what was it? – Benjamin May 23 '19 at 20:57
  • I think the pattern is not right. You may need `([A-Za-z]+ \\+ [A-Za-z]+)` – akrun May 23 '19 at 21:00
  • @Benjamin, df1$new is the same as df1$Node -- not what I expect. – Ketty May 23 '19 at 21:02
  • @Ketty here, it replaces `gsubfn::gsubfn("([A-Za-z]+ \\+ [A-Za-z]+)", as.list(setNames(df2$Node,df2$Label)), df1$Node)` based on the key/value pair ie. the part where `One + one` which matches with the second dataset, but if you need to replace the other part, have to create a second keyvalue pair – akrun May 23 '19 at 21:03
  • @akrun, this pattern doesn't work either. It replaces "One + one" with "1.1", but it doesn't replace "Two ; two" or "Three ; one." – Ketty May 23 '19 at 21:12
  • @Ketty it is because of the match. I posted an easier solution for this – akrun May 23 '19 at 21:13
  • 1
    You could also use `stringr` and `dplyr`, and you wouldn't even need a regular expression: `df1 <- mutate(df1, New = str_replace_all(New, setNames(df2$Node, df2$Label)))`. Your question specified use of `gsubfn`, or I'd offer this as an answer. ;) – Benjamin May 23 '19 at 21:16
  • 1
    @Benjamin, thanks so much! Really appreciate it :) – Ketty May 23 '19 at 22:02

1 Answers1

1

We can do this more easily

library(gsubfn)
library(english)
gsubfn("([a-z]+)", as.list(setNames(1:9, as.character(as.english(1:9)))), 
                tolower(gsub("\\s*[+;]\\s*", ".", df1$Node)))
#[1] "1.1 > 2.1 > 3.2" "1.2 > 2.2 > 3.1" "1.1 > 2.2 > 3.1" 
#[4] "1.2 > 2.1 > 3.2" "1.1 > 2.2 > 3.2"

Update

Based on the new example, we can do this in base R

nm1 <- setNames(df2$Node, df2$Label)
sapply(strsplit(df1$Node, " > "), function(x) paste(nm1[x], collapse = " > "))
#[1] "1.1 > 2.1 > 3.2" "1.2 > 2.2 > 3.1" "1.1 > 2.2 > 3.1" 
#[4] "1.2 > 2.1 > 2.2" "1.1 > 2.2 > 3.2"
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I updated my question with some real-world data. The ones I posted earlier are just simple examples. In real data, users can put any text as the node label for 1.1. for example "AO Ales + Bitter". – Ketty May 23 '19 at 21:30
  • 1
    Beautiful. Thanks so much, everyone! – Ketty May 23 '19 at 22:02