Return the names of data frames from a loop into a new data frame as IDs

Question

Suppose I have multiple data frames with the same prefixes and same structure.

mydf_1 <- data.frame('fruit' = 'apples', 'n' = 2)
mydf_2 <- data.frame('fruit' = 'pears', 'n' = 0)
mydf_3 <- data.frame('fruit' = 'oranges', 'n' = 3)

I have a for-loop that grabs all the tables with this prefix, and appends those that match a certain condition.

res <- data.frame()

for(i in mget(apropos("^mydf_"), envir = .GlobalEnv)){
  
  if(sum(i$n) > 0){
    res <- rbind.data.frame(res, data.frame('name' = paste0(i[1]),
                                            'n' = sum(i$n)))
  }
}

res

This works fine, but I want my 'res' table to identify the name of the original data frame itself in the 'name' column, instead of the column name. My desired result is:

The closest I have gotten to solving this issue is:

'name' = paste0(substitute(i))

instead of

'name' = paste0(i[1])

but it just returns 'i'.

Any simple solution? Base preferred but not essential.

If a data.frame have many kinds of fruit, do you want to discard the fruit names and sum their counts together? — Darren Tsai, May 06 '22 at 03:07
You should really be using lists. Please see this [answer](https://stackoverflow.com/a/24376207/1422451): Don't ever create `d1` `d2` `d3`, ..., `dn` in the first place. Create a list `d` with `n` elements. — Parfait, May 06 '22 at 03:11
And I'm still hoping for that `base` approach that should be, reasonably. achievable. — Chris, May 06 '22 at 04:31

score 1 · Answer 1 · answered May 06 '22 at 03:20

To bind a list of data.frames and store the list names as a new column, a convenient way is to set the arg .id in dplyr::bind_rows().

library(dplyr)

mget(apropos("^mydf_")) %>%
  bind_rows(.id = "name") %>%
  count(name, wt = n) %>%
  filter(n > 0)

#     name n
# 1 mydf_1 2
# 2 mydf_3 3

AndrewGB · Accepted Answer · 2022-05-06T04:06:05.013

1

As mentioned in the comments, it is better to put dataframes into a list as it much easier to handle and manipulate them. However, we could still grab the dataframes from the global environment, get the sum for each dataframe, then bind them together and add the dataframe name as a row.

library(tidyverse)

df_list <-
  do.call("list", mget(grep("^mydf_", names(.GlobalEnv), value = TRUE))) %>%
  map(., ~ .x %>% summarise(n = sum(n))) %>%
  discard(~ .x == 0) %>% 
  bind_rows(., .id = "name")

Or we could use map_dfr to bind together and summarise, then filter out the 0 values:

map_dfr(mget(ls(pattern = "^mydf_")), ~ c(n = sum(.x$n)), .id = "name") %>%
  filter(n != 0)

Output

    name n
1 mydf_1 2
2 mydf_3 3

edited May 06 '22 at 04:06

answered May 06 '22 at 03:23

AndrewGB

16,126
5
18
49

1

In this case `map_dfr` is more suitable than `map_df` becasue it has its own arg `.id` and runs `bind_rows` under the hood (so you don't need to call `bind_rows` again). – Darren Tsai May 06 '22 at 03:43
1

@DarrenTsai Good call! Thanks very much; that's much more concise and more efficient. Definitely still learning about the flexibility of the `map` family. – AndrewGB May 06 '22 at 04:07

Return the names of data frames from a loop into a new data frame as IDs

2 Answers2