1

I have the following data.table with two columns;

library(data.table)
dt1 <- as.data.table(data.frame(
    relative.1 = c("up", "down", "up", "down", "down", 
      "up", "up", "up", "down", "down"), 
    color.1 = c(
        "<span style=     color: red !important; >0.00239377213823793</span>", 
        "<span style=     color: red !important; >0.0189475913373258</span>", 
        "<span style=     color: red !important; >0.000944874682014027</span>", 
        "<span style=     color: red !important; >0.00115563834695583</span>", 
        "<span style=     color: red !important; >0.00190146895689528</span>", 
        "<span style=     color: red !important; >0.00905363339565874</span>", 
        "<span style=     color: red !important; >0.00786719465124788</span>", 
        "<span style=     color: red !important; >0.0021806607355806</span>", 
        "<span style=     color: black !important; >0.0677967189492317</span>", 
        "<span style=     color: black !important; >0.0643565809998716</span>"
    ), stringsAsFactors = FALSE))

I would like to replace numeric characters within ">" and "<" with a string in corresponding row of the column, "relative.1". For example, in the first row, I'd like to replace "0.00239377213823793" with "up".

I'd appreciate any pointers.

CSJCampbell
  • 2,025
  • 15
  • 19
akh22
  • 661
  • 4
  • 16

2 Answers2

1

The data.table package uses an update in place operator := to allow you to efficiently update columns. You can refer to other columns within the scope of the data.table. There are various ways of making edits to strings, and while regular expressions are not suitable for parsing HTML, the following pattern works for the example you have here.

dt1[, color.1 := stringr::str_replace(
    string = color.1, 
    pattern = "[0-9.]+", 
    replacement = relative.1)]
dt1
# relative.1                                                color.1
#  1:         up     <span style=     color: red !important; >up</span>
#  2:       down   <span style=     color: red !important; >down</span>
#  3:         up     <span style=     color: red !important; >up</span>
#  4:       down   <span style=     color: red !important; >down</span>
#  5:       down   <span style=     color: red !important; >down</span>
#  6:         up     <span style=     color: red !important; >up</span>
#  7:         up     <span style=     color: red !important; >up</span>
#  8:         up     <span style=     color: red !important; >up</span>
#  9:       down <span style=     color: black !important; >down</span>
# 10:       down <span style=     color: black !important; >down</span>
CSJCampbell
  • 2,025
  • 15
  • 19
  • One quick question on "pattern = "[0-9.]+"". Could you explains how "[0-9.]+" select just numeric characters in this case ? Thanks. – akh22 Feb 26 '21 at 21:05
  • If you take a look at `?regex` there is a primer in regular expressions. They are deep and powerful tooling, but https://xkcd.com/1171/. `[]` means a set of values to choose from. `0-9` means all values in the range 0 to 9. `+` means 1 or more of these together without any other characters in between. – CSJCampbell Mar 01 '21 at 08:13
0
dt1[, color.1 := sub('(?<=>)[0-9]+\\.[0-9]+(?=<)', relative.1, color.1, perl = TRUE), by = relative.1]

#     relative.1                                                color.1
#  1:         up     <span style=     color: red !important; >up</span>
#  2:       down   <span style=     color: red !important; >down</span>
#  3:         up     <span style=     color: red !important; >up</span>
#  4:       down   <span style=     color: red !important; >down</span>
#  5:       down   <span style=     color: red !important; >down</span>
#  6:         up     <span style=     color: red !important; >up</span>
#  7:         up     <span style=     color: red !important; >up</span>
#  8:         up     <span style=     color: red !important; >up</span>
#  9:       down <span style=     color: black !important; >down</span>
# 10:       down <span style=     color: black !important; >down</span>
s_baldur
  • 29,441
  • 4
  • 36
  • 69