1

I am trying to replace last 3 000 with K in a column in the dataframe

eg:

data <- data.frame(abc = c(1000, 100000, 450000))

abc <- 1000
then abc <- 1K

if 
abc <- 100000
then abc <- 100K

gsub or regex replaces the first 3 zeroes

I tried this:

lapply(data$abc, gsub, pattern = "000", replacement = "K", fixed = TRUE)

Also, how can I make it work on an interval like :

data <- data.frame(abc = c("150000-250000", "100000-150000", "250000K+"))
Bruce Wayne
  • 471
  • 5
  • 18

2 Answers2

4

An option is to use %/% with 1000 and paste the "K"

library(dplyr)
library(stringr)
data %>%
   mutate(abc = str_c(abc %/% 1000, "K"))

Or using sub, match the 3 zeros at the end ($) of the string and replace with "K"

options(scipen = 999)
sub("0{3}$", "K", data$abc)
#[1] "1K"   "100K" "450K"

If we have a different string with interval, then change the pattern to match 3 zeros at either at the end ($) or before a - and replace with "K"

gsub("0{3}(?=-|$)", "K", "150000-250000", perl = TRUE)
#[1] "150K-250K"
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Here is a slight modification of your code. format is to turn off the scientific notation. sapply makes the output becomes a vector. 000$ means only match those at the end.

data <- data.frame(abc = c(1000, 100000, 450000))

data$abc <- format(data$abc, scientific = FALSE)

gsub(pattern = "000$", replacement = "K", data$abc)
# [1] "  1K" "100K" "450K"
www
  • 38,575
  • 12
  • 48
  • 84