1

In a data table, all the cells are numeric, and what i want do is to replace all the numbers into a string like this:

Numbers in [0,2]: replace them with the string "Bad"

Numbers in [3,4]: replace them with the string "Good"

Numbers > 4 : replace them with the string "Excellent"

Here's an example of my original table called "data.active": enter image description here

My attempt to do that is this:

x <- c("churches","resorts","beaches","parks","Theatres",.....)
for(i in x){
  data.active$i <- as.character(data.active$i)
  data.active$i[data.active$i <= 2] <- "Bad"
  data.active$i[data.active$i >2 && data.active$i <=4] <- "Good"
  data.active$i[data.active$i >4] <- "Excellent"
}

But it doesn't work. is there any other way to do this?

EDIT

Here's the link to my dataset GoogleReviews_Dataset and here's how i got the table in the image above:

library(FactoMineR)
library(factoextra)
data<-read.csv2(file.choose())
data.active <- data[1:10, 4:8]
hamza saber
  • 511
  • 1
  • 4
  • 18
  • 3
    The function `cut` is for breaking continuous numeric vectors into discrete factors. You'd be better off posting a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with more clear detail than "it doesn't work" – camille Feb 06 '19 at 17:29

2 Answers2

2

You can use the tidyverse's mutate-across combination to condition on the ranges:

library(tidyverse)

df <- tibble(
  x = 1:5, 
  y = c(1L, 2L, 2L, 2L, 3L), 
  z = c(1L,3L, 3L, 3L, 2L),
  a = c(1L, 5L, 6L, 4L, 8L),
  b = c(1L, 3L, 4L, 7L, 1L)
)

df %>% mutate(
  across(
    .cols = everything(),
    .fns = ~ case_when(
      .x <= 2             ~ 'Bad',
      (.x > 3) & (. <= 4) ~ 'Good',
      (.x > 4)            ~ 'Excellent',
      TRUE                ~ as.character(.x)
    )
  )
)

The .x above represents the element being evaluated (using a purrr-style functioning). This results in

# A tibble: 5 x 5
  x         y     z     a         b        
  <chr>     <chr> <chr> <chr>     <chr>    
1 Bad       Bad   Bad   Bad       Bad      
2 Bad       Bad   3     Excellent 3        
3 3         Bad   3     Excellent Good     
4 Good      Bad   3     Good      Excellent
5 Excellent 3     Bad   Excellent Bad      

For changing only select columns, use a selection in your .cols parameter for across:

df %>% mutate(
  across(
    .cols = c('a', 'x', 'b'),
    .fns = ~ case_when(
      .x <= 2             ~ 'Bad',
      (.x > 3) & (. <= 4) ~ 'Good',
      (.x > 4)            ~ 'Excellent',
      TRUE                ~ as.character(.x)
    )
  )
)

This yields

# A tibble: 5 x 5
  x             y     z a         b        
  <chr>     <int> <int> <chr>     <chr>    
1 Bad           1     1 Bad       Bad      
2 Bad           2     3 Excellent 3        
3 3             2     3 Excellent Good     
4 Good          2     3 Good      Excellent
5 Excellent     3     2 Excellent Bad      
Werner
  • 14,324
  • 7
  • 55
  • 77
  • this works fine with the list u've given in the code, but with my dataset doesn't work. i've cheked the typeof my table and it is "list" just like your "df", but it doesn't work – hamza saber Feb 06 '19 at 19:18
  • @hamzasaber: Okay. Create a data set we can work with and we can manage fixing the code... – Werner Feb 06 '19 at 19:20
  • This is some warnings it gives me : Messages d'avis : 1: In Ops.factor(beaches, 2.7) : ‘<=’ not meaningful for factors 2: In Ops.factor(beaches, 2.7) : ‘>’ not meaningful for factors 3: In Ops.factor(beaches, 4.1) : ‘<=’ not meaningful for factors – hamza saber Feb 06 '19 at 19:22
  • @hamzasaber: Ahhh, I see. Your columns contain *characters*, not numbers. Perhaps use `as.numeric(.)` wherever I have `.`. – Werner Feb 06 '19 at 19:26
  • as.numeric(.) does removes the warnings BUT, my whole table is now filled with string "Excellent". Apparently as.numeric(.) changes everything to numbers greater than 100 and that's why it replace the values with "Excellent", cause using as.numeric(.), every value is greater than 4 – hamza saber Feb 06 '19 at 19:39
  • Another thing; i have checked the typeof my values in the table. They're all "integers" – hamza saber Feb 06 '19 at 19:42
  • as it seems i have to do this as.numeric(as.character(.)) even if they are integers. Thank you so much for your help and i will check your answer as a solution for this problem Thank you so much – hamza saber Feb 06 '19 at 19:50
  • @hamzasaber: I've added a method to download and process the data directly using the `tidyverse` at the bottom of my answer. Not sure why you're selecting columns 1:10 *and* 4:8 since the former already includes the latter. Perhaps you just want some duplicates in your data? – Werner Feb 06 '19 at 19:58
  • i'm choosing 10 rows and the columns 4 to 8 just to test things. The goal is to use Multiple correspondence analysis on the dataset!! so i need 3 levels of values. – hamza saber Feb 06 '19 at 20:13
  • @Werner Thank you so much for this easy to apply answer. I definitely need to learn more about using tidyverse and get better at writing my own functions, but this got me out of a stuck spot. – The_Tams Jan 11 '22 at 20:56
  • @The_Tams: I've updated the answer to use a more current `tidyverse` syntax (`mutate` and `across`). – Werner Jan 12 '22 at 17:50
1
x<-c('x','y','z')
df[,x] <- lapply(df[,x], function(x) 
                         cut(x ,breaks=c(-Inf,2,4,Inf),labels=c('Bad','Good','Excellent'))))

Data

df<-structure(list(x = 1:5, y = c(1L, 2L, 2L, 2L, 3L), z = c(1L,3L, 3L, 3L, 2L), 
a = c(1L, 5L, 6L, 4L, 8L),b = c(1L, 3L, 4L, 7L, 1L)), 
class = "data.frame", row.names = c(NA, -5L))
A. Suliman
  • 12,923
  • 5
  • 24
  • 37
  • how to access columns named as strings not chars for example: x<-c("xxx","yyy","zzz") those are the names of my columns !! – hamza saber Feb 06 '19 at 19:19