3

I'm trying to learn how to use apply (or any other members of the family of apply) to loop over variables in a data.frame

For example: say I have the following data.frame

    df_long <- data.frame(id=c(1,1,1,1,2,2,2,2,3,3,3,3), 
             country=c('a','a','a','a','b','b','b','b','c','c','c','c'),
             year=c(1,2,3,4,1,2,3,4,1,2,3,4),
             amt = c(3,4,23,5,76,5,2,3,5,4,6,2))

and I want to loop through all the variables such that if the variable is numeric, then I had one to it, else I do nothing. I want the return variable to be a data.frame. This is what I have so far but it doesn't work

    apply(df_long, 2, function(x) x = ifelse(is.numeric(x), x+1, x))

Any insights on this question or in general how to loop through variables in a data.frame using apply and/or other methods would be greatly appreciated.

Amazonian
  • 391
  • 2
  • 8
  • 22
  • 1
    This could be of help to understand how to apply a function to specific columns https://stackoverflow.com/questions/18503177/r-apply-function-on-specific-dataframe-columns – Ronak Shah Jul 10 '18 at 04:23

2 Answers2

1

I would first find columns which are numeric using is.numeric and then add 1 to only those columns. sapply/lapply loops over each column and returns TRUE/FALSE if the columns is numeric or not. We use that logical indices (col_ind) to subset the dataframe and add a 1 to it.

col_ind <- sapply(df_long, is.numeric)
df_long[col_ind] <- df_long[col_ind] + 1
df_long

#   id country year amt
#1   2       a    2   4
#2   2       a    3   5
#3   2       a    4  24
#4   2       a    5   6
#5   3       b    2  77
#6   3       b    3   6
#7   3       b    4   3
#8   3       b    5   4
#9   4       c    2   6
#10  4       c    3   5
#11  4       c    4   7
#12  4       c    5   3

Possibly a more simpler approach with dplyr in one-liner.

library(dplyr)
df_long %>%
  mutate_if(is.numeric, funs(. + 1))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
1

I tried with sapply and apply to follow the method that you have originally asked for but the challenge with that is that it is trying to coerce the result into to a matrix. Which is either forcing all variables to be returned as characters or it is converting the country variable into numeric and is converting a to 1, b to 2 and so on.

If you prefer a single line of code using one of the apply functions then I recommend using lapply. lapply will return the result as a list, which can then be converted to a dataframe. A solution is below:

as.data.frame(
  lapply(
    df_long, 
    function(col) 
      if(is.numeric(col)) {col + 1} else {col}))

The result is:

   id country year amt
1   2       a    2   4
2   2       a    3   5
3   2       a    4  24
4   2       a    5   6
5   3       b    2  77
6   3       b    3   6
7   3       b    4   3
8   3       b    5   4
9   4       c    2   6
10  4       c    3   5
11  4       c    4   7
12  4       c    5   3
Rachit Kinger
  • 341
  • 2
  • 10