2

I am trying to extract the last numbers from a unique code, in each row. I do not know how to pursue this task, with tidyverse only, in R

here is an example:

structure(list(`CCGCode` = c("E38000232", "E38000237", 
"E38000004", "E38000240", "E38000006", "E38000007"), Total = c(17, 
27, 27, 43, 30, 42)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

By spotting the CCGCode in the data frame above you'll see the code in each row. I want to create a new column with ccg that takes only the last three digits of that code. For example, we have E38000006 in CCGCode and want into the new column ccg only the 006 because does are the last digits. How to do this.

GaB
  • 1,076
  • 2
  • 16
  • 29
  • 1
    i don't know the tidyverse solution, but you can use `gsub('(...)$|.', '\\1', CCGCode)` or if all the codes are 10 characters long `substr(CCGCode, 7, 10)` – rawr Jul 31 '20 at 18:11
  • Another alternative `df %>% mutate(code = stringr::str_trunc(CCGCode, width = 3, side = "left", ellipsis = ""))` – user12728748 Jul 31 '20 at 18:27

4 Answers4

1

Here's a couple ways to do this which produce slightly different end results, hopefully one of them is along the lines of what you're looking for.

Method 1

df %>% 
  separate(CCGCode, c("CCGCode", "Last_3"), sep = -3)

Method 2

str_sub(df$CCGCode, -3) %>%
  as.tibble() %>% 
  bind_cols(df) %>%
  select(CCGCode, "Last_3" = value, Total)
Chad S
  • 26
  • 3
1

If you want the number at the end of the CCGCode you could use gsub from base. If you need something from tidyverse, perhaps write code with a %>% pipe symbol. Try this:

#Assigning your example to df
df <-structure(list(`CCGCode` = c("E38000232", "E38000237", 
"E38000004", "E38000240", "E38000006", "E38000007"), Total = c(17, 
27, 27, 43, 30, 42)), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

#gsub removes beginning of CCG code and assigns to df$new_col
df$CCGCode %>% gsub("E38000", "",.) ->df$new_col
SEAnalyst
  • 1,077
  • 8
  • 15
1

Try str_sub() in stringr

library(dplyr)
library(stringr)

df %>%
  mutate(code = str_sub(CCGCode, -3))

# # A tibble: 6 x 3
#   CCGCode   Total code 
#   <chr>     <dbl> <chr>
# 1 E38000232    17 232  
# 2 E38000237    27 237  
# 3 E38000004    27 004  
# 4 E38000240    43 240  
# 5 E38000006    30 006  
# 6 E38000007    42 007  

or using word()(also in stringr)

df %>%
  mutate(code = word(CCGCode, -3, -1, sep = "(?<=.)(?=.)"))
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
  • 1
    I am sorry, the previous person did answer. But your code is more elegant. I have gave Chad S because he was the first, he is also new the the stack overflow and indeed it worked. Also, I want to encourage him as he is new. However, I appreciate your input. – GaB Jul 31 '20 at 18:55
1

You can try this approach using tidyverse

library(tidyverse)
df <- data.frame(CCGCode = c("E38000232", "E38000237", "E38000004", "E38000240", "E38000006", "E38000007"), 
                 Total = c(17, 27, 27, 43, 30, 42))

df2 <- df %>% 
  mutate(CCG = str_extract(CCGCode, regex("\\d{3}$")))
#     CCGCode Total CCG
# 1 E38000232    17 232
# 2 E38000237    27 237
# 3 E38000004    27 004
# 4 E38000240    43 240
# 5 E38000006    30 006
# 6 E38000007    42 007
Tho Vu
  • 1,304
  • 2
  • 8
  • 20
  • thank you. it worked yes I suppose to be fair to the first who answered. Thank you. Your is really nice as it is applicable to so many other problems! – GaB Jul 31 '20 at 18:56
  • Yeah. It's my pleasure! We have a bunch of solutions. Just use whatever it serves best for you. – Tho Vu Jul 31 '20 at 18:59