11

If I am doing

lapply(dataframe, function(x) {
    column.name <- #insert code here
})

How would I be able to access the name of the column that the lapply function is currently processing? I want to assign the name of the column to a variable, column.name, as indicated in the code. Just to clarify, yes, column.name WILL change with each iteration of the lapply.

tonytonov
  • 25,060
  • 16
  • 82
  • 98
user41912
  • 557
  • 1
  • 6
  • 18
  • The column name should be the name of each list element returned by `lapply()`. Does that resolve your issue? – TARehman Jul 16 '14 at 18:01
  • You could `lapply(seq_along(dataframe), function(i) names(dataframe)[i])`, but it might be more convenient to just use a "for" loop since you, also, want to modify your "dataframe". – alexis_laz Jul 16 '14 at 18:01
  • 2
    I prefer to write the function so that it works on the names themselves -- that way the output will be a named list. Something like `lapply(names(dataframe), function(x) { dataframe[x] }` – AndrewMacDonald Jul 16 '14 at 18:04
  • @TARehman No, I know that I will get the column names when 'lapply()' returns. I need the column name in the function. I think I will just have to settle by using suggestions from the other two commenters. – user41912 Jul 16 '14 at 18:08
  • I don't think you _can_ get the column name in the way you're talking about. I'm pretty sure `lapply()` breaks the data frame into vectors for each column, which don't generally have names. You'd have to change your function to do this. – TARehman Jul 16 '14 at 18:10
  • What is your final aim. If you are trying to modify values for each column why not just overwrite the dataframe after the lapply? – James Jul 16 '14 at 18:16
  • So I need to iterate through each column, and need the column name because the column name will be used to identify which file to read in. Then the function will do some process on the file. – user41912 Jul 16 '14 at 18:18
  • @alexis_laz This way works and provides what I need. Thanks~ – user41912 Jul 16 '14 at 18:19
  • A simple for loop would honestly solve all problems, and I am a Java programmer and that comes naturally, but after having used immensely slow for loops in R, I try to stay away from them. I know that the R apply functions are faster, and thus am trying to learn them. – user41912 Jul 16 '14 at 18:22
  • 2
    @TARehman In R, anything is possible :) See my answer. – tonytonov Jul 17 '14 at 07:25

3 Answers3

10

There is a way, actually.

df <- data.frame(a = 1:2, b = 3:4, c = 5:6)
lapply(df, function(x) names(df)[substitute(x)[[3]]])
$a
[1] "a"

$b
[1] "b"

$c
[1] "c"

But that should be used as a last resort. Instead, use something like (another option is given in comments)

lapply(seq_along(df), function(x) names(df[x]))
[[1]]
[1] "a"

[[2]]
[1] "b"

[[3]]
[1] "c"
tonytonov
  • 25,060
  • 16
  • 82
  • 98
  • Ah thats pretty intuitive. But I guess the second way works just fine too. – user41912 Jul 17 '14 at 11:29
  • Using this to get the index is very interesting:function(x) names(df)[substitute(x)[[3]]]); while the second I personally think is not as good as using names to locate the columns – cloudscomputes May 30 '23 at 01:43
5

You can iterate over an index, but this is not very R-like code. A more direct route is to use Map, the multivariate version of lapply, which iterates a function of appropriate arity in parallel across whatever parameters are passed to it:

Map(function(value, name){paste(name, sum(value), sep = ": ")}, 
    Formaldehyde, 
    names(Formaldehyde))
#> $carb
#> [1] "carb: 3.1"
#> 
#> $optden
#> [1] "optden: 2.747"

If using the tidyverse, purrr::imap is a similar convenience version of purrr::map2 that automatically uses the names of the first parameter as a second parameter:

purrr::imap(Formaldehyde, ~paste(.y, sum(.x), sep = ": "))
#> $carb
#> [1] "carb: 3.1"
#> 
#> $optden
#> [1] "optden: 2.747"

Versions of each that simplify are available: for Map, mapply, a multivariate sapply (of which Map is technically just a wrapper with SIMPLIFY = FALSE); for imap, versions with a subscript of the type to simplify to, e.g. imap_chr.

alistaire
  • 42,459
  • 4
  • 77
  • 117
1

How to pass a variable into the function while using lapply

a lapply with two variables so I don't have to keep rewriting the function for each state.

library(tidycensus)    
get_Census <- function(x,y) {
      get_decennial(geography = "block group",
                    variables = "P001001",
                    sumfile = "sf1",
                    key = mykey,
                    state = x, county = y,year = "2000",
                    geometry = FALSE)
    }
    CO<-c("067","073","113")
    lapply(CO,get_Census,x="06")
Mox
  • 511
  • 5
  • 15
  • This lets you set a variable to a single value for all iterations, but wouldn't work for iterating over multiple states. A simple option is to use `Map`, the multivariate version of `lapply`, which you can pass a vector of states and a corresponding vector of counties. If you're using the tidyverse, `purrr::map2` does the same thing. – alistaire May 07 '18 at 20:29