1

I have a data frame df with special columns:

df<- data.frame(w= 1:3, x=3:5, y=6:8, z = I(list(1:2, 1:3, 1:4)))
df <- as.data.frame(do.call(cbind, lapply(df[1:3], function(x) Map("*", 
         df$z, x))))

>df

           w                x                  y
        1, 2             3, 6              6, 12
     2, 4, 6         4, 8, 12          7, 14, 21
 3, 6, 9, 12    5, 10, 15, 20      8, 16, 24, 32

I want to replace any number in df which has a value less than 6 with the number 6 and every value greater than 8 with the number 8. I do not want to touch the numbers in between and I want to maintain the data frame structure.

To achieve this, I have written a function transfo

transfo<- function(x){
  x <- unlist(x)
  if (x < 6){ x <- 6}
  if (x > 8){ x <- 8}
  x 
}

When I run the following code:

transformed <- as.data.frame(sapply(df, transfo))

I get 10 of the warning messages:

1: In if (x < 6) { :
  the condition has length > 1 and only the first element will be used

...and I do not get the required output.

My expected output is

>transformed 

               w                x                  y
            6, 6             6, 6               6, 8
         6, 6, 6          6, 8, 8            7, 8, 8
      6, 6, 8, 8       6, 8, 8, 8         8, 8, 8, 8

I will be very grateful for a hint on the fastest way to replace all elements of the data frame df with 6 if they are less than 6 and with 8 if they are greater than 8 since I work with a large data set with 3000 rows.

Thanks in advance.

akrun
  • 874,273
  • 37
  • 540
  • 662
Charles
  • 161
  • 12

2 Answers2

4

Assuming that the columns are list of vector, the OP got the warning as there are more than one element or the length is greater than 1. Instead of if/else we can use ifelse or if_else or case_when within mutate_all (as we need to change all the columns) and looping through the list with map

library(tidyverse)
out <- df %>%
         mutate_all(funs(map(., ~ case_when(.x < 6 ~ 6,
                                             .x > 8 ~ 8,
                                              TRUE ~ as.numeric(.x)))))
out
#           w          x          y
#1       6, 6       6, 6       6, 8
#2    6, 6, 6    6, 8, 8    7, 8, 8
#3 6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8

Or using pmin/pmax

df %>% 
    mutate_all(funs(map(., ~pmax(.x, 6) %>%
                                    pmin(8))))
#           w          x          y
#1       6, 6       6, 6       6, 8
#2    6, 6, 6    6, 8, 8    7, 8, 8
#3 6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8

Instead of applying the function on each of the nested list, we could unlist it and later relist back to the original structure

df %>% 
    mutate_all(funs(relist(pmin(pmax(unlist(.), 6), 8), skeleton = .)))

Or the same logic in base R

df[] <- lapply(df, function(x) relist(pmin(pmax(unlist(x), 6), 8), skeleton = x))

Or in data.table

library(data.table)
setDT(df)[, lapply(.SD,  function(x) relist(pmin(pmax(unlist(x), 6), 8), 
               skeleton = x))]

Benchmarks

Created a slightly bigger dataset by replicating the rows of the 'df'

df1 <- df[rep(seq_len(nrow(df)), 5000),]

system.time({
df1 %>% 
    mutate_all(funs(map(., ~pmax(.x, 6) %>%
                                    pmin(8))))

 })
# user  system elapsed 
# 6.116   0.017   6.159 

system.time({
df1 %>% 
    mutate_all(funs(relist(pmin(pmax(unlist(.), 6), 8), skeleton = .)))
    })
#  user  system elapsed 
#  0.389   0.000   0.389 

The data.table and lapply (base R) methods also time similar to the one with dplyr using the modified code with relist

akrun
  • 874,273
  • 37
  • 540
  • 662
  • df %>% mutate_all(funs(map(., ~pmax(.x, 6) %>%pmin(8)))) works fine but it takes a considerable amount of time when applied to a large data set of 3000 rows and 5 columns. Is it possible to provide me with a more time effective hint? – Charles Jul 02 '18 at 16:37
  • @Charles Can you test whether the updated one works faster? – akrun Jul 02 '18 at 17:03
  • All modified code tested except for the data table option still could not accomplish within 30 minutes. Can you please hint on a possibility to interface with C++? – Charles Jul 03 '18 at 14:37
  • @Charles Could you please as a new question – akrun Jul 03 '18 at 18:01
0

Also works

> out <- as.data.frame(do.call(cbind, lapply(df, function(i){
     lapply(i, function(j){
         ifelse((j < 6), 6, ifelse((j > 8), 8, j))
     })
 })))
> out
           w          x          y
1       6, 6       6, 6       6, 8
2    6, 6, 6    6, 8, 8    7, 8, 8
3 6, 6, 8, 8 6, 8, 8, 8 8, 8, 8, 8
Charles
  • 161
  • 12