-2

I want to paste a number and some letters together to index them. The columns of my dataframe are as follows;

enter image description here

When CNTR is NA, i want it to be the booking number + an index, so for booking 202653 for example, I want it to be 202653A and 202653B. I already achieved pasting the booking numbers into the CNTR column when its empty with;

dfUNIT$CNTR <- ifelse(is.na(dfUNIT$CNTR), dfUNIT$BOOKING, dfUNIT$CNTR)

which gives me the following table;

enter image description here

But as I said, I need unique CNTR values. My dataframe contains thousands of rows and changes frequently, is there a way to 'index' them the way I want (A, B, C etc)? Thank you in advance

BillyBouw
  • 314
  • 2
  • 10

2 Answers2

2

I'll make up some data,

dat <- data.frame(B=c(202658,202654,202653,202653),C=c("TCLU","KOCU",NA,NA))

dplyr

library(dplyr)
dat %>%
  group_by(B) %>%
  mutate(C = if_else(is.na(C), paste0(B, LETTERS[row_number()]), C))
# # A tibble: 4 x 2
# # Groups:   B [3]
#        B C      
#    <dbl> <chr>  
# 1 202658 TCLU   
# 2 202654 KOCU   
# 3 202653 202653A
# 4 202653 202653B

A fundamental risk in this is if you ever have more than 26 rows for a booking, in which case the letter-suffix will fail. An alternative is to append a number instead (e.g., paste0(B, "_", row_number()) or add some other safeguards.

base R alternatives

do.call(rbind, by(dat, dat[,"B",drop=FALSE],
                  FUN = function(z) transform(z,
                    C = ifelse(is.na(C), paste0(B, LETTERS[seq_along(z$C)]), C)
                  )
))

or

append <- ave(dat$C, dat$B, FUN = function(z) ifelse(is.na(z), LETTERS[seq_along(z)], ""))
append
# [1] ""  ""  "A" "B"
dat$C <- paste0(ifelse(is.na(dat$C), dat$B, dat$C), append)
dat
#        B       C
# 1 202658    TCLU
# 2 202654    KOCU
# 3 202653 202653A
# 4 202653 202653B
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

If you don't insist on using letters to index the transformations, here's arough and ready dplyr solution based on rleid from the data.table package:

library(dplyr)
library(data.table)
df %>%
  group_by(grp = rleid(B)) %>%
  mutate(CNTR_new = if_else(is.na(CNTR), paste0(B, "_", grp), CNTR))
# A tibble: 7 x 4
# Groups:   grp [5]
      B CNTR    grp CNTR_new
  <dbl> <chr> <int> <chr>   
1    12 TCU       1 TCU     
2    13 NA        2 13_2    
3    13 NA        2 13_2    
4    15 NA        3 15_3    
5     1 PVDU      4 PVDU    
6     1 NA        4 1_4     
7     5 NA        5 5_5 

Data:

df <- data.frame(
  B = c(12,13,13,15,1,1,5),
  CNTR = c("TCU", NA, NA, NA, "PVDU", NA, NA)
)
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • 1
    If you don't want to bring in `data.table` just for that function, you can use `my_rleid <- function(x) { yy <- rle(x); rep(seq_along(yy$lengths), yy$lengths); }` (from https://stackoverflow.com/a/61596776/3358272). – r2evans May 14 '21 at 12:57