0

I am trying to loop through a list of data frames in the global environment. I want to extract the variable name, substring the variable name, filter (tidyverse) each dataframe, and then save each filtered dataframe. However, I'm having quite a bit of trouble:

query_loop <- function(df){

    name <- deparse(substitute(df));
    cpt <- paste("cpt_","20", substring(name, 14, 15), sep = "");
    assign(cpt, filter(df, CPT == "12345"));
    write.table(cpt, file = paste(deparse(substitute(cpt)), ".txt", sep =""), row.names = F, sep = "\t");
}

dfs1 <- lapply(dfs, query_loop)

The code fails at the first step of my function. When I try to print(deparse(substitute(df))), I get a list of X[[i]], which I understand is because the dataframes are not named when I pass them to lapply. However, I don't know what the correct solution is.

Any help would be greatly appreciated. Thanks!

  • `names(dfs)` is a character vector, so `df` in your function is a length-1 character vector with the name of the current data frame. Normally one uses `deparse(substitute())` to get a string--you already have a string. – Gregor Thomas Jul 28 '22 at 16:19
  • The _correct_ solution is not to have all your data frames in the global environment, but in a list. How does `names(df)` return the names of data frames in your global environment? If you already have the names of all the data frames, you can use `mget` to obtain all those data frames in a list – Allan Cameron Jul 28 '22 at 16:19
  • Sorry, I posted my code wrong. I am passing in ```dfs``` into lapply, which is a list of the dataframes. The rest of my question is correct, i.e. when I try to print(deparse(substitute(df))), it prints out X[[i]] 6 times (the number of dataframes in the list) – grapeporcupine Jul 28 '22 at 16:23
  • So... is your `list` named? If so, use the names of the list as your code shows. If not, name the list and use the names of the list as your code shows. – Gregor Thomas Jul 28 '22 at 16:31
  • Otherwise [there's this workaround](https://stackoverflow.com/a/18511080/903061) (and I would close your question as a duplicate of that one), but working with the names seems much easier. – Gregor Thomas Jul 28 '22 at 16:32
  • Is there a way to name the lists programmatically? I created the ```dfs``` list using this code: ```dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))``` – grapeporcupine Jul 28 '22 at 18:21

1 Answers1

1

Suggested simplification (untested, obviously, as there's no data to test on).

## assumption: `dfs` is a named list of data frames

# create a list of filtered data frames with appropriate names
filtered_list = lapply(dfs, filter, CPT == "12345")
names(filtered_list) = paste0("cpt_","20", substring(names(dfs), 14, 15))

# write them to files
lapply(names(filtered_list), function(nm) {
  write.table(
    x = filtered_list[[nm]],
    file = paste0(nm, ".txt"),
    row.names = F, sep = "\t")
})
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Thank you - this worked. However, is there a way to programmatically name all the data frame elements of dfs list? I first loaded several data frames into the global environment, and I would like to name the data frame elements of the list using the global environment variable names. I loaded them using ```Filter(function(x) is(x, "data.frame"), mget(ls()))```. Also, after I run the code you provided above, the console outputs ```[[1]] NULL [[2]] NULL [[3]] NULL [[4]] NULL [[5]] NULL [[6]] NULL``` Is there a reason why? – grapeporcupine Jul 30 '22 at 15:40
  • The NULLs are there because `write.table` returns NULL. You can stop the printing by adding a `invisible()` as the last line of `function(nm)`. As for the naming, I'd strongly suggest loading the data frames directly into a list, not the global environment. [See my answer at How to make a list of data frames?](https://stackoverflow.com/questions/17499013/how-do-i-make-a-list-of-data-frames/24376207#24376207) for examples and discussion. – Gregor Thomas Jul 31 '22 at 01:44
  • But I'm also confused that you don't have names, because when I run your code`x = Filter(function(x) is(x, "data.frame"), mget(ls()))`, and I look at `names(x)`, `x` does have the names of the global environment variables. – Gregor Thomas Jul 31 '22 at 01:45
  • That's weird. The way I have been loading the dataframes is by doing: ```temp = list.files(pattern="*.txt")``` , then ```for (i in 1:length(temp)) assign(temp[i], read.delim(temp[i]))```, then ```dfs <- Filter(function(x) is(x, "data.frame"), mget(ls()))```. I will try to load the files into a named list next time. – grapeporcupine Aug 01 '22 at 16:11
  • You should do `temp = list.files(pattern="*.txt"); data_list = lapply(temp, read.delim); names(data_list) = temp` – Gregor Thomas Aug 01 '22 at 16:12
  • Ok, I will give it a try next time. Thanks! – grapeporcupine Aug 01 '22 at 16:13
  • Or, if you use `purrr` you can skeip the `names()` step: `data_list = purrr::map_dfr(temp, read.delim)`. – Gregor Thomas Aug 01 '22 at 16:14