2

I have a tribble with a chr column that contains the unicode to emojis. I want to split these strings into two columns in case of need, if there are more than two backslash in the whole string. So I need a split with the 2nd backslash. It would also be enough to just delete everything from the 2nd backslash on.

Here is what I tried:

df <- tibble::tribble(
  ~RUser, ~REmoji,
  "User1", "\U0001f64f\U0001f3fb",
  "User2", "\U0001f64f",
  "User2", "\U0001f64f\U0001f3fc"
)

df %>% mutate(newcol = gsub("\\\\*", "", REmoji))

I found the solution Replace single backslash in R. But in my case I have only one backslash, and I don't understand how to separate the column here.

The result should look like this output:

df2 <- tibble::tribble(
  ~RUser, ~REmoji1, ~newcol,
  "User1", "\U0001f64f", "\U0001f3fb",
  "User2", "\U0001f64f", "", #This Field is empty, since there was no Emoji-Modification
  "User2", "\U0001f64f", "\U0001f3fc"
)

Thanks a lot!

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
Alex_
  • 189
  • 8

2 Answers2

2

We could also use substring from base R

df$newcol <- substring(df$REmoji, 2)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    I think they want to extract the first character not remove it: `substr(df$REmoji, 1, 1)` – GKi Jul 01 '21 at 06:13
1

Note these \U... are single Unicode code points, not just a backslash + digits/letters.

Using the ^. PCRE regex with sub provides the expected results:

> df %>% mutate(newcol = sub("^.", "", REmoji, perl=TRUE))
# A tibble: 3 x 3
  RUser REmoji                 newcol      
  <chr> <chr>                  <chr>       
1 User1 "\U0001f64f\U0001f3fb" "\U0001f3fb"
2 User2 "\U0001f64f"           ""          
3 User2 "\U0001f64f\U0001f3fc" "\U0001f3fc"

Make sure you pass the perl=TRUE argument.

And in order to do the reverse, i.e. keep the first code point only, you can use:

df %>% mutate(newcol = sub("^(.).+", "\\1", REmoji, perl=TRUE))
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thank you! That correctly outputs the second part of the string in its own column. But what would be the smartest way to get only the front part. Because in the end I don't need the second part, only the front one. – Alex_ Jun 30 '21 at 13:40
  • 1
    @Alex_ `df %>% mutate(newcol = sub("^(.).+", "\\1", REmoji, perl=TRUE))` will keep the first code point. – Wiktor Stribiżew Jun 30 '21 at 13:41
  • 1
    @Alex_ Does it work as expected now? Note my output is exactly as you asked for in df2. – Wiktor Stribiżew Jun 30 '21 at 13:48