3

how to use lapply with mutate function

hello, I'm trying to use lapply with mutate function. I'm dealing with nested list data.

Let's take an example. given is nested list with two elements. Each element is 10*2 list.

given<-replicate(2,list(matrix(unlist(replicate(10,sample(c(0.2,0.3,0.4,0.1),2,replace=FALSE),simplify=FALSE)),ncol=2)))
colnames(given[[1]])<-c('a','b')
colnames(given[[2]])<-c('a','b')
given

I will convert 0.1 and 0.2 to 'low', 0.3 to 'middle', 0.4 to 'high'. I used lapply, mutate and if_else function.

new_given<-lapply(seq_along(given), function(x){
  mutate(x,
         given[[x]][['new']] = if_else(given[[x]][['a']] %in% c(0.1,0.2),'low',
                                      if_else(given[[x]][['I12']] %in% c(0.3),'middle','high')))})

However, the errored occured. It said there was an 'unexpected ')''. However, the number of bracket are paired right.

> new_given<-lapply(seq_along(given), function(x){
+   mutate(x,
+          given[[x]][['new']] = if_else(given[[x]][['a']] %in% c(0.1,0.2),'low',
Error: unexpected '=' in:
"  mutate(x,
         given[[x]][['new']] ="
>                                       if_else(given[[x]][['I12']] %in% c(0.3),'middle','high')))})
Error: unexpected ')' in "                                      if_else(given[[x]][['I12']] %in% c(0.3),'middle','high'))"
> 

Would you tell me what was the problem and how to solve it?

*additional information : I read this article, Using lapply with mutate in R However, it used data.frame, not dealing with list data. So the approaches seemed different.

zx8754
  • 52,746
  • 12
  • 114
  • 209
ESKim
  • 422
  • 4
  • 14
  • It's not typically a good idea to try to diagnose the last error before figuring out why the first error occurs. – Dason Jul 12 '19 at 08:50

2 Answers2

1

First of all you have got list of matrices and not dataframes. Also you can direct lapply over given here instead of doing seq_along

library(dplyr)

lapply(given, function(x) {  
   data.frame(x) %>%
     mutate(new = if_else(a %in% c(0.1,0.2),'low',
                             if_else(a %in% c(0.3),'middle','high')))})


#[[1]]
#     a   b    new
#1  0.2 0.1    low
#2  0.1 0.2    low
#3  0.4 0.4   high
#4  0.3 0.2 middle
#5  0.1 0.3    low
#6  0.3 0.1 middle
#7  0.4 0.2   high
#8  0.1 0.3    low
#9  0.3 0.1 middle
#10 0.4 0.3   high

#[[2]]
#     a   b    new
#1  0.3 0.1 middle
#2  0.1 0.3    low
#3  0.3 0.1 middle
#4  0.2 0.3    low
#5  0.1 0.4    low
#6  0.4 0.1   high
#7  0.1 0.2    low
#8  0.2 0.3    low
#9  0.4 0.4   high
#10 0.3 0.1 middle

Moreover, a better way is to keep the approaches separate. So a pure base R solution would be

lapply(given, function(x) 
      transform(data.frame(x), 
       new = ifelse(a %in% c(0.1,0.2),'low',ifelse(a %in% c(0.3),'middle','high'))))

while if you prefer tidyverse

map(given, ~ data.frame(.) %>%
             mutate(new = if_else(a %in% c(0.1,0.2),'low',
                               if_else(a %in% c(0.3),'middle','high'))))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • He is using doubles. You cannot expect that his values are exactly what they appear to be. He should use `cut`. – January Jul 12 '19 at 08:40
  • Thanks for answering. I tried Ronak Shah's `lapply` approaches and both worked successfully. However, what is the problem with not using `cut`? – ESKim Jul 12 '19 at 08:47
  • I am explaining it in a separate answer, but try `3 * 0.1 / 3 * 10 == 1`. What answer is correct? What do you get? – January Jul 12 '19 at 08:49
  • 1
    To Ronak shah : Then, It seems `mutate` and `transform` functions are only able to work with `data.frame` object, not `matrix`. right? – ESKim Jul 12 '19 at 08:49
  • 1
    @ESKim yes, correct! Also @Janury's claim about floating point comparison is correct. `%in%` or `==` might not be the best way to compare them. Read https://stackoverflow.com/questions/2769510/numeric-comparison-difficulty-in-r – Ronak Shah Jul 12 '19 at 08:53
1

There are numerous problems with your approach. First, the error you are getting is only a side effect of copying the rest of the line after the first error (unexpected '=') occurred.

The reason for that error is different, however. You are applying mutate to x. x is a numeric vector of length 1. However, mutate works only on data frames (not even matrices!). You could convert your matrices to data.frames first, though (as Ronak suggests in the other answer).

Finally, your matrices are doubles. Your approach might work most of the time, but it is not guaranteed to work always, because even if a number looks like 0.3, it might be in reality 0.3000000000000000001, in which case %in% 0.3 will return FALSE. It might not look likely right now, but trust me, sooner or later this approach will hurt you and you won't see it coming. I speak from experience.

Let us first create a function that takes a matrix and based on its first column decides whether it should be 'low', 'middle or 'high'.

cut function takes a number of breaks and for each number assigns a factor level denoting a given interval:

cut(given[[1]][,1], c(-Inf, 0.2, 0.3, Inf))

result:

 [1] (0.3, Inf] (-Inf,0.2] (-Inf,0.2] (0.3, Inf] (-Inf,0.2] (0.3, Inf]
 [7] (-Inf,0.2] (-Inf,0.2] (0.3, Inf] (-Inf,0.2]
Levels: (-Inf,0.2] (0.2,0.3] (0.3, Inf]

We can directly assign labels to the result:

cut(given[[1]][,1], c(-Inf, 0.2, 0.3, Inf), labels=c("low", "mid", "high"))

We can make it into a function:

mklevels <- function(mtx) {
  cut(mtx[,1], c(-Inf, 0.2, 0.3, Inf), labels=c("low", "mid", "high"))
}

Rather than converting matrices to data frames and adding a new column, why not create a new data frame with one column per matrix:

data.frame(sapply(given, mklevels))

This has the advantage that if the matrices are large and used for other computational purposes, changing them into data frames is not an efficient approach.

If you really, really want to work with %in%, then convert the data to factors. That way you will be able to inspect the factor levels and see whether there is a problem. For example:

x <- c(0.3, 0.2, 0.3 + 1e-11, 0.1)
x

Looks innocent enough:

> x
[1] 0.3 0.2 0.3 0.1

However, x[4] %in% .3 returns FALSE. But convert it to a factor and look at the levels:

factor(x)

[1] 0.3           0.2           0.30000000001 0.1          
Levels: 0.1 0.2 0.3 0.30000000001

Once you converted your data to factors, you can safely take Ronak's approach. But I would never try it with numeric vectors!

January
  • 16,320
  • 6
  • 52
  • 74