Add new variable to list of data frames with purrr and mutate() from dplyr

Question

I know that there are many related questions here on SO, but I am looking for a purrr solution, please, not one from the apply list of functions or cbind/rbdind (I want to take this opportunity to get to know purrr better).

I have a list of dataframes and I would like to add a new column to each dataframe in the list. The value of the column will be the name of the dataframe, i.e. the name of each element in the list.

There is something similar here, but it involves the use of a function and mutate_each(), whereas I need just mutate().

To give you an idea of the list (called comentarios), here is the first line of str() on the first element:

> str(comentarios[1])
List of 1
 $ 166860353356903_661400323902901:'data.frame':    13 obs. of  7 variables:

So I would like my new variable to contain 166860353356903_661400323902901 for 13 lines in the result, as an ID for each dataframe.

What I am trying is:

dff <- map_df(comentarios, 
              ~ mutate(ID = names(comentarios)),
              .id = "Group"
              )

However, mutate() needs the name of the dataframe in order to work:

Error in mutate_(.data, .dots = lazyeval::lazy_dots(...)) : 
  argument ".data" is missing, with no default

It doesn't make sense to put in each name, I'd be straying into loop territory and losing the advantages of purrr (and R, more generally). If the list was smaller, I'd use reshape::merge_all(), but it has over 2000 elements. Thanks in advance for any help.

edit: some data to make the problem reproducible, as per alistaire's comments

# install.packages("tidyverse")
library(tidyverse)
df <- data_frame(one = rep("hey", 10), two = seq(1:10), etc = "etc")

list_df <- list(df, df, df, df, df)
names(list_df) <- c("first", "second", "third", "fourth", "fifth")
dfs <- map_df(list_df, 
              ~ mutate(id = names(list_df)),
              .id = "Group"
              )

You need to make your example [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example#5963610) by adding data. — alistaire, Feb 03 '17 at 17:03
I don't think that's necessary here, alistaire, it's a question about syntax more than anything, as Jake's answer showed. — RobertMyles, Feb 03 '17 at 17:09
[It is always necessary](http://stackoverflow.com/help/mcve), or the question will be closed. [ask] — alistaire, Feb 03 '17 at 17:12
Better, though you should show your desired output, as well. Assuming a bit, you can just do `dplyr::bind_rows(list_df, .id = 'id')`. — alistaire, Feb 03 '17 at 17:44

Jake Kaupp · Accepted Answer · 2017-08-22T09:57:18.087

21

Your issue is that you have to explicitly provide reference to the data when you're not using mutate with piping. To do this, I'd suggest using map2_df

dff <- map2_df(comentarios, names(comentarios), ~ mutate(.x, ID = .y))

edited Aug 22 '17 at 09:57

answered Feb 03 '17 at 16:40

Jake Kaupp

7,892
2
26
36

It's just map over two arguments. The first argument is `.x` the list of dataframes, the second is `.y` which is list of dataframe names. – Jake Kaupp Feb 03 '17 at 16:53
Sure, but I wouldn't have thought that I could do it that way, that's what I meant. This is exactly why I asked for a purrr answer, as I want to get to know the package better. Thanks again for your help. – RobertMyles Feb 03 '17 at 17:07
@JakeKaupp - what is the meaning of this part: ".id = "Group". When I ommit this - the code still works fine – Tomasz Mikolajczyk Aug 22 '17 at 07:24
It was a typo and a misplaced bracket. The `. id ` was a economic variable in OP's question. – Jake Kaupp Aug 22 '17 at 09:58
2

FYI, starting in *purrr_0.2.3* there is a "short-hand" family of `imap` indexed functions when you want to loop through a list and the names (or indexes) of the list simultaneously. – aosmith Dec 14 '17 at 18:13

score 5 · Answer 2 · answered Nov 29 '17 at 22:11

using the OP's data the answer would be

library(tidyverse)
df <- data_frame(one = rep("hey", 10), two = seq(1:10), etc = "etc")

list_df <- list(df, df, df, df, df)
dfnames <- c("first", "second", "third", "fourth", "fifth")

dfs <- list_df %>% map2_df(dfnames,~mutate(.x,name=.y))

Add new variable to list of data frames with purrr and mutate() from dplyr

2 Answers2

Linked

Related