2

I have a string which I want to split up and replace the last 2 numbers with characters. So for example a string of "1-1-2-2" would become "1-1-B-B". I have included a snippet of what I'm trying to do and my attempt so far and hopefully it becomes a bit clearer.

> df
num
1-1-26-2
1-2-2-4
1-2-4-5
1-3-25-1

So now I have attempted to split up the old_num column using strsplit(num, '-') but unsure of how to replace the last 2 digits with the characters using the replacement df from below

> replacement_df
character    num
A            1
B            2
D            4
E            5
Y            25
Z            26
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
Joe
  • 795
  • 1
  • 11
  • Have you seen this? https://stackoverflow.com/questions/37239715/convert-letters-to-numbers – David Jul 10 '23 at 14:05

3 Answers3

2

Something like this?

replace_nums <- function(x, n = 2) {
    x_split <- unlist(strsplit(x, "-"))

    x_tail <- tail(x_split, n)

    paste(c(
        head(x_split, -n),
        LETTERS[as.integer(x_tail)]
    ), collapse = "-")
}

x <- c("1-1-2-2")
replace_nums(x)
# [1] "1-1-B-B"

Or for a vectorised version:

replace_nums_df <- function(x, n = 2) {
    x_split <- strsplit(x, "-")

    x_tail <- lapply(x_split, \(x) tail(x, n))

    Map(\(split_str, tail_str) {
        paste(c(
            head(split_str, -n),
            LETTERS[as.integer(tail_str)]
        ), collapse = "-")
    }, x_split, x_tail)
}

df$replaced <- replace_nums_df(df$num)
df
#        num replaced
# 1 1-1-26-2  1-1-Z-B
# 2  1-2-2-4  1-2-B-D
# 3  1-2-4-5  1-2-D-E
# 4 1-3-25-1  1-3-Y-A
SamR
  • 8,826
  • 3
  • 11
  • 33
  • 1
    I ended up with almost the same.. You can simplify this slightly by using just `head(x_split, -n)` to exclude the last `n` items. – Robert Hacken Jul 10 '23 at 14:13
  • 1
    @RobertHacken thanks! I knew there was something with a unary `-` but couldn't remember where to use it. I've updated the answer. – SamR Jul 10 '23 at 14:16
2

1. stringr solution

Supply a custom function into str_replace_all() to replace the match of the last 2 numbers.

library(dplyr)
library(stringr)

df %>%
  mutate(num_new = str_replace_all(num, "\\d+-\\d+$", \(x) {
    str_c(LETTERS[as.integer(str_split_1(x, '-'))], collapse = '-')
  }))

2. tidyr solution

separate_wider_regex() + unite()

library(dplyr)
library(tidyr)

df %>%
  separate_wider_regex(
    num,
    patterns = c(col1 = ".+", "-", col2 = "\\d+", "-", col3 = "\\d+"),
    cols_remove = FALSE
  ) %>%
  mutate(across(col2:col3, ~ LETTERS[as.integer(.x)])) %>%
  unite(num_new, col1:col3, sep = "-")
Output
# # A tibble: 4 × 2
#   num      num_new
#   <chr>    <chr>  
# 1 1-1-26-2 1-1-Z-B
# 2 1-2-2-4  1-2-B-D
# 3 1-2-4-5  1-2-D-E
# 4 1-3-25-1 1-3-Y-A

For a generalized case, i.e. not all strings in the column have equal amounts of numbers.

df <- data.frame(num = c("1-2-3", "1-2-3-4", "1-2-3-4-5"))

Both solutions above can deal with this:

#         num   num_new
# 1     1-2-3     1-B-C
# 2   1-2-3-4   1-2-C-D
# 3 1-2-3-4-5 1-2-3-D-E
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
2

Alternatively please try the below code where I assume that the replacement_df is same as that of LETTER

here I used separate and unite functions

library(tidyverse)


# identify the length of the string
len <- max(lengths(strsplit(df$num,'-')))

# create the variables names
nam <- paste0('l',seq(1:len))

# select last 2 names
nam2 <- nam[(len-1):len]

df %>% separate(num,into = c(nam), sep = '\\-', remove = F, fill = 'left') %>% 
  mutate(across(all_of(nam2), ~LETTERS[as.numeric(.x)])) %>% 
  unite(num_new,all_of(nam), sep = '-', na.rm = T)

Created on 2023-07-10 with reprex v2.0.2

        num   num_new
1     1-2-3     1-B-C
2   1-2-3-4   1-2-C-D
3 1-2-3-4-5 1-2-3-D-E

jkatam
  • 2,691
  • 1
  • 4
  • 12
  • Good idea (+1) ! But you have assumed all of the strings are composed of 4 numbers. Maybe you could make it more flexible to deal with strings of arbitrary amount of numbers, e.g. "1-2-3-4-5-6" to "1-2-3-4-E-F". – Darren Tsai Jul 11 '23 at 07:26
  • Thank you @DarrenTsai, I updated the code to make it more robust as per your suggestion. – jkatam Jul 11 '23 at 07:40
  • Nice work! I mean that **not** all strings in the column have equal amounts of numbers. E.g. `df <- data.frame(num = c("1-2-3", "1-2-3-4", "1-2-3-4-5"))`, and an ideal output should be `"1-B-C"`, `"1-2-C-D"`, `"1-2-3-D-E"`. Maybe your code can be adapted for this generalized case. – Darren Tsai Jul 11 '23 at 08:12
  • 1
    Thank you @DarrenTsai, I updated my code further to work with the sample data you provided and it works. thanks for pushing me to make my code efficient – jkatam Jul 11 '23 at 15:47