2

I often want to map over a vector of column names in a data frame, and keep track of the output using the .id argument. But to write the column names related to each map iteration into that .id column seems to require doubling up their name in the input vector - in other words, by naming each column name with its own name. If I don't name the column with its own name, then .id just stores the index of the iteration.

This is expected behavior, per the purrr::map docs:

.id
Either a string or NULL. If a string, the output will contain a variable with that name, storing either the name (if .x is named) or the index (if .x is unnamed) of the input.

But my approach feels a little clunky, so I imagine I'm missing something. Is there a better way to get a list of the columns I'm iterating over, that doesn't require writing each column name twice in the input vector? Any suggestions would be much appreciated!

Here's an example to work with:

library(rlang)
library(tidyverse)

tb <- tibble(foo = rnorm(10), bar = rnorm(10))

cols_once <- c("foo", "bar")
cols_once %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# A tibble: 2 x 2
  var       avg   <-- var stores only the iteration index
  <chr>   <dbl>
1 1     -0.0519
2 2      0.204 

cols_twice <- c("foo" = "foo", "bar" = "bar")
cols_twice %>% map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# A tibble: 2 x 2
  var       avg   <-- var stores the column names
  <chr>   <dbl>
1 foo   -0.0519
2 bar    0.204 
andrew_reece
  • 20,390
  • 3
  • 33
  • 58

2 Answers2

3

Here's an alternative solution for your specific scenario using summarize_at and gather:

tb %>% summarize_at( cols_once, mean ) %>% gather( var, avg )
# # A tibble: 2 x 2
#   var      avg
#   <chr>  <dbl>
# 1 foo   0.374 
# 2 bar   0.0397

In a more general scenario, I don't think there's a way around naming your cols_once when working with map_dfr, because of the expected behavior you pointed out in your question. However, you can use the "snake case" wrapper for setNames() to do it more elegantly:

cols_once %>% set_names %>% 
  map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")
# # A tibble: 2 x 2
#   var      avg
#   <chr>  <dbl>
# 1 foo   0.374 
# 2 bar   0.0397
Artem Sokolov
  • 13,196
  • 4
  • 43
  • 74
  • 1
    Thanks very much for this, I didn't know about the `set_names` wrapper, I like that approach. I accepted prosoitos' answer as it essentially provided the setNames answer first, but I've definitely learned from your answer. Much appreciated. – andrew_reece Nov 15 '18 at 17:33
  • I spent a lot of time thinking about this and I also learnt from this answer! – prosoitos Nov 15 '18 at 18:35
1

You could create your input vector easily with:

setNames(names(tb), names(tb))

So your code would be:

setNames(names(tb), names(tb)) %>%
  map_dfr(~ tb %>% summarise(avg = mean(!!sym(.x))), .id="var")

Edit following your comment:

Still not the solution you are hoping for, but when you don't use all the column names, you could still use setNames() and subset the ones you want (or subset out the ones you don't).

tb <- tibble(foo = rnorm(10), bar = rnorm(10), taz = rnorm(10))

setNames(names(tb), names(tb))[-3]
prosoitos
  • 6,679
  • 5
  • 27
  • 41
  • Thanks for this. Yes, I do use `setNames` when I'm using all of the columns in a data frame. It's a little less time-saving when I am only using a partial subset. I suppose I should have set up my example data such that I'm only using, say, 2 out of 3 variable names, to establish the more general case. What I'm hoping to find is an answer that doesn't require the doubling of the names themselves, though, which your answer still does (albeit more neatly than the example code I provided). – andrew_reece Nov 15 '18 at 06:11
  • Right. I suspected that this was not what you were hoping for. But since you need a named vector as input, I feel that the solution you are hoping for may not exist. I hope someone will prove me wrong though! – prosoitos Nov 15 '18 at 06:14
  • I edited my answer, though I am sure this is already what you are doing – prosoitos Nov 15 '18 at 06:24