1

Suppose I have multiple data frames with the same prefixes and same structure.

mydf_1 <- data.frame('fruit' = 'apples', 'n' = 2)
mydf_2 <- data.frame('fruit' = 'pears', 'n' = 0)
mydf_3 <- data.frame('fruit' = 'oranges', 'n' = 3)

I have a for-loop that grabs all the tables with this prefix, and appends those that match a certain condition.

res <- data.frame()

for(i in mget(apropos("^mydf_"), envir = .GlobalEnv)){
  
  if(sum(i$n) > 0){
    res <- rbind.data.frame(res, data.frame('name' = paste0(i[1]),
                                            'n' = sum(i$n)))
  }
}

res

This works fine, but I want my 'res' table to identify the name of the original data frame itself in the 'name' column, instead of the column name. My desired result is:

enter image description here

The closest I have gotten to solving this issue is:

'name' = paste0(substitute(i))

instead of

'name' = paste0(i[1])

but it just returns 'i'.

Any simple solution? Base preferred but not essential.

Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
k3b
  • 344
  • 3
  • 15
  • If a data.frame have many kinds of fruit, do you want to discard the fruit names and sum their counts together? – Darren Tsai May 06 '22 at 03:07
  • Yes, I should have mentioned that. @DarrenTsai – k3b May 06 '22 at 03:09
  • 2
    You should really be using lists. Please see this [answer](https://stackoverflow.com/a/24376207/1422451): Don't ever create `d1` `d2` `d3`, ..., `dn` in the first place. Create a list `d` with `n` elements. – Parfait May 06 '22 at 03:11
  • And I'm still hoping for that `base` approach that should be, reasonably. achievable. – Chris May 06 '22 at 04:31

2 Answers2

1

To bind a list of data.frames and store the list names as a new column, a convenient way is to set the arg .id in dplyr::bind_rows().

library(dplyr)

mget(apropos("^mydf_")) %>%
  bind_rows(.id = "name") %>%
  count(name, wt = n) %>%
  filter(n > 0)

#     name n
# 1 mydf_1 2
# 2 mydf_3 3
Darren Tsai
  • 32,117
  • 5
  • 21
  • 51
1

As mentioned in the comments, it is better to put dataframes into a list as it much easier to handle and manipulate them. However, we could still grab the dataframes from the global environment, get the sum for each dataframe, then bind them together and add the dataframe name as a row.

library(tidyverse)

df_list <-
  do.call("list", mget(grep("^mydf_", names(.GlobalEnv), value = TRUE))) %>%
  map(., ~ .x %>% summarise(n = sum(n))) %>%
  discard(~ .x == 0) %>% 
  bind_rows(., .id = "name")

Or we could use map_dfr to bind together and summarise, then filter out the 0 values:

map_dfr(mget(ls(pattern = "^mydf_")), ~ c(n = sum(.x$n)), .id = "name") %>%
  filter(n != 0)

Output

    name n
1 mydf_1 2
2 mydf_3 3
AndrewGB
  • 16,126
  • 5
  • 18
  • 49
  • 1
    In this case `map_dfr` is more suitable than `map_df` becasue it has its own arg `.id` and runs `bind_rows` under the hood (so you don't need to call `bind_rows` again). – Darren Tsai May 06 '22 at 03:43
  • 1
    @DarrenTsai Good call! Thanks very much; that's much more concise and more efficient. Definitely still learning about the flexibility of the `map` family. – AndrewGB May 06 '22 at 04:07