100

I want to use the apply function on a dataframe, but only apply the function to the last 5 columns.

B<- by(wifi,(wifi$Room),FUN=function(y){apply(y, 2, A)})

This applies A to all the columns of y

B<- by(wifi,(wifi$Room),FUN=function(y){apply(y[4:9], 2, A)})

This applies A only to columns 4-9 of y, but the total return of B strips off the first 3 columns... I still want those, I just don't want A applied to them.

wifi[,1:3]+B 

also does not do what I expected/wanted.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
skmathur
  • 1,587
  • 5
  • 14
  • 21
  • 2
    The 'by' call is complicating this question. If it's relevant you should rewrite the question to clarify (what is wifi$Room?). I've ignored by in my answer below. – leif Aug 29 '13 at 06:32
  • You could `cbind(y[1:3], ...)` to the result you are getting. – IRTFM Aug 29 '13 at 06:46

6 Answers6

121

lapply is probably a better choice than apply here, as apply first coerces your data.frame to an array which means all the columns must have the same type. Depending on your context, this could have unintended consequences.

The pattern is:

df[cols] <- lapply(df[cols], FUN)

The 'cols' vector can be variable names or indices. I prefer to use names whenever possible (it's robust to column reordering). So in your case this might be:

wifi[4:9] <- lapply(wifi[4:9], A)

An example of using column names:

wifi <- data.frame(A=1:4, B=runif(4), C=5:8)
wifi[c("B", "C")] <- lapply(wifi[c("B", "C")], function(x) -1 * x)
madx
  • 6,723
  • 4
  • 55
  • 59
leif
  • 3,003
  • 1
  • 19
  • 9
70

Using an example data.frame and example function (just +1 to all values)

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))
wifi

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  1  1  1  1  1  1
#2  2  2  2  2  2  2  2  2  2
#3  3  3  3  3  3  3  3  3  3
#4  4  4  4  4  4  4  4  4  4

data.frame(wifi[1:3], apply(wifi[4:9],2, A) )
#or
cbind(wifi[1:3], apply(wifi[4:9],2, A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or even:

data.frame(wifi[1:3], lapply(wifi[4:9], A) )
#or
cbind(wifi[1:3], lapply(wifi[4:9], A) )

#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • Is there a way to do this using `$` to index a certain column by name instead of using `[ : ]` to index by column number? I tried adding colnames: `colnames(wifi) = c("a", "b", "c", "d", "e", "f", "g", "h" ,"i")` but any attempt at using lapply(wifi$e, 2, X) wasn't happening. – santeko Apr 21 '15 at 23:27
  • 9
    @skotturi - you can do this like `wifi[c("a","b","c")]` to index multiple columns by name. – thelatemail Apr 21 '15 at 23:35
  • @thelatemail,In `apply(wifi[4:9],2, A)`,`wifi[4:9]` is `data.frame`.And `apply` can only used to array or matrix.Why your answer workable? – kittygirl May 22 '20 at 18:18
  • @kittygirl - that's because apply *can* be used on a data.frame. The data.frame will be coerced to a matrix as part of the function when apply is used. – thelatemail May 22 '20 at 18:25
  • @thelatemail,will lose rowname or colname information? – kittygirl May 22 '20 at 18:28
5

This task is easily achieved with the dplyr package's across functionality.

Borrowing the data structure suggested by thelatemail:

A <- function(x) x + 1
wifi <- data.frame(replicate(9,1:4))

We can indicate the columns we wish to apply the function to either by index like this:

library(dplyr)
wifi %>% 
   mutate(across(4:9, A))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5

Or by name:

wifi %>% 
   mutate(across(X4:X9, A))
#  X1 X2 X3 X4 X5 X6 X7 X8 X9
#1  1  1  1  2  2  2  2  2  2
#2  2  2  2  3  3  3  3  3  3
#3  3  3  3  4  4  4  4  4  4
#4  4  4  4  5  5  5  5  5  5
Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
1

As mentioned, you simply want the standard R apply function applied to columns (MARGIN=2):

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A)

Or, for short:

wifi[,4:9] <- apply(wifi[,4:9], 2, A)

This updates columns 4:9 in-place using the A() function. Now, let's assume that na.rm is an argument to A(), which it probably should be. We can pass na.rm=T to remove NA values from the computation like so:

wifi[,4:9] <- apply(wifi[,4:9], MARGIN=2, FUN=A, na.rm=T)

The same is true for any other arguments you want to pass to your custom function.

Adam Erickson
  • 6,027
  • 2
  • 46
  • 33
0

The easiest way is to use the mutate function:

dataFunctionUsed <- data %>% 
  mutate(columnToUseFunctionOn = function(oldColumn ...))
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
-2

I think what you want is mapply. You could apply the function to all columns, and then just drop the columns you don't want. However, if you are applying different functions to different columns, it seems likely what you want is mutate, from the dplyr package.

Mox
  • 511
  • 5
  • 15