0

I have a dataset like below, how can I remove the '#number'?

df>
terms                             year
5;#Remote Production;#10;         2021
53;#=Product-Category:Routing     2021
30;#HDR;#5;#Remote Production     2020
...

I need it to be like this:

df>
terms                          year
#Remote Production             2021
#Product-Category:Routing      2021
#HDR;#Remote Production     2020
...

The number at the beginning without the # also needs to be removed

TylerH
  • 20,799
  • 66
  • 75
  • 101
kvjing
  • 83
  • 5

1 Answers1

4

An option with str_remove

library(stringr)
library(dplyr)
df %>%
   mutate(terms = str_c('#', str_remove_all(terms, "^\\d+;#\\=?|#\\d+;")))

-output

#                     terms year
#1       #Remote Production; 2021
#2 #Product-Category:Routing 2021
#3   #HDR;#Remote Production 2020

data

df <- structure(list(terms = c("5;#Remote Production;#10;", "53;#=Product-Category:Routing", 
"30;#HDR;#5;#Remote Production"), year = c(2021L, 2021L, 2020L
)), class = "data.frame", row.names = c(NA, -3L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • why you use the str_c function? – kvjing Feb 10 '21 at 21:26
  • @kvjing because it was removing the `#` so, I reinserted that by pasting using `str_c` at the beginning – akrun Feb 10 '21 at 21:32
  • It gives an error: Error: unexpected '=' in: "df %>% mutate(terms=" – kvjing Feb 10 '21 at 21:37
  • should I run this piece of code first? df <- structure(list(terms = c("5;#Remote Production;#10;", "53;#=Product-Category:Routing", "30;#HDR;#5;#Remote Production"), year = c(2021L, 2021L, 2020L )), class = "data.frame", row.names = c(NA, -3L)) – kvjing Feb 10 '21 at 21:42
  • @kvjing have you loaded the library i.e. `library(dplyr)` – akrun Feb 10 '21 at 21:46
  • @kvjing it works for me - updated the output I got – akrun Feb 10 '21 at 21:46
  • @kvjing I am not getting the error. The `data` showed in my post is to show a reproducible example. You can get that structure from `dput(yourdata)`, but that is not needed – akrun Feb 10 '21 at 21:49
  • ok I get it too. There is another problem: in some entries, the "terms" are empty. after running this code, there is a "#" added, which is not what I want... do you know how to remove it? – kvjing Feb 10 '21 at 22:04
  • @kvjing Do you have `NA` or `""` for those elements. If it is `""`, then we can use `df %>% mutate(terms = case_when(terms != "" ~ str_c('#', str_remove_all(terms, "^\\d+;#\\=?|#\\d+;")), TRUE ~ terms))` and if it is `NA` use `!is.na(terms) ~` in `case_when` – akrun Feb 10 '21 at 22:16