3

I want to recode all the columns in my dataframe that contain the string "calcium" anywhere in the column name. So I'm trying to combine grepl with mutate from dplyr, but I get an error.

Any idea what I'm doing wrong? I hope this is possible!

The code I've tried is below using dplyr,

#Make the dataframe
library(dplyr)
fake <-data.frame(id=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),              
              time=c(rep("Time1",9), rep("Time2",9)), 
              test=c("calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc","calcium","magnesium","zinc"), 
              score=rnorm(18))
df <- dcast(fake, id ~ time + test)

#My attempt
df <- df %>% mutate(category=cut(df[,grepl("calcium", colnames(df))], breaks=c(-Inf, 1.2, 6, 12, Inf), labels=c(0,1,2,3)))
#Error:  'x' must be numeric

#My second attempt 
df <- df %>% mutate_at(vars(contains('calcium')), cut(breaks=c(-Inf, 1.2, 6, 12, Inf), labels=c(0,1,2,3)))
#Error: "argument "x" is missing, with no default"
Z.Lin
  • 28,055
  • 6
  • 54
  • 94
CineyEveryday
  • 127
  • 1
  • 8

1 Answers1

3

Is this what you are after?

library(tidyverse)
library(reshape2) # I added this for your dcast

fake <-data.frame(id=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2,3,3,3),              
                  time=c(rep("Time1",9), rep("Time2",9)), 
                  test=c("calcium","magnesium","zinc","calcium","magnesium","zinc", 
                         "calcium","magnesium","zinc","calcium","magnesium","zinc",
                         "calcium","magnesium","zinc","calcium","magnesium","zinc"), 
                  score=rnorm(18))
df <- dcast(fake, id ~ time + test)
df <- as_tibble(df) #added this

#code
df <- df %>% 
  mutate_at(vars(contains('calcium')), 
            ~cut(., 
                 breaks=c(-Inf, 1.2, 6, 12, Inf), 
                 labels=c(0, 1, 2, 3))) %>%
  mutate_at(vars(ends_with("_calcium")), funs(as.numeric)) 

Which produces this:

# A tibble: 3 x 7
     id Time1_calcium Time1_magnesium Time1_zinc Time2_calcium Time2_magnesium
  <dbl>         <dbl>           <dbl>      <dbl>         <dbl>           <dbl>
1     1             2          -0.256      0.303             1          0.144 
2     2             2           2.18       0.417             1          0.0650
3     3             1           0.863     -2.32              1          0.163 
# ... with 1 more variable: Time2_zinc <dbl>

Based on this: https://suzan.rbind.io/2018/02/dplyr-tutorial-2/#mutate-at-to-change-specific-columns

william3031
  • 1,653
  • 1
  • 18
  • 39
  • may I also ask how you would add this as a new column to the data rather than overwriting the original column? – CineyEveryday Jun 06 '19 at 01:28
  • Possibly some variation of this: https://stackoverflow.com/questions/45947787/create-new-variables-with-mutate-at-while-keeping-the-original-ones – william3031 Jun 06 '19 at 01:41
  • ah also, do you know how I can keep it as numeric rather than making it become a factor? I tried modifying your code to be mutate_at(as.numeric(as.character...) and also tried ...as.numeric(as.character(~cut(...) but I get errors :( – CineyEveryday Jun 06 '19 at 19:47
  • I converted the df to a tibble, then added the last bit. It didn't seem to work as a data.frame, but did as tibble. Not sure why. I always work in tibbles anyway. It does what you want it to. – william3031 Jun 06 '19 at 23:00