0

I have four lists each with multiple data frames.

I need to apply the same function on the lists.

How can I do this?

Sample data:

df1 <- data.frame(x = 1:3, y = letters[1:3])
df2 <- data.frame(x = 4:6, y = letters[4:6])
df3 <- data.frame(x = 7:9, y = letters[7:9])
df4 <- data.frame(x = 10:12, y = letters[10:12])
list1 <- list(df1,df2)
list2 <- list(df3,df4)

In my real data I import based on a pattern in the filename and thus my list elements will have the following names (sample data):

names(list1) <- c("./1. Data/df1.csv", "./1. Data/df2.csv")
names(list2) <- c("./1. Data/df3.csv", "./1. Data/df4.csv")    

And this is one of the functions I want to run on all lists.

element.name <- function(x) {
  
      all_filenames <- names(x) %>% 
      basename() %>% 
      as.list()

      names(x) <- all_filenames

      names(x) <- gsub("\\.csv", "", names(x))
    }

which will give the desired output

names(list1) <- element.name(list1)
names(list1)
[1] "df1"  [2] "df2"

I've tried using a for loop but I end up overwriting my output, so I hope some of you can help me out, since I need to run a lot of functions on my lists.

Louise Sørensen
  • 225
  • 1
  • 11
  • Why don't you read in all csv files into a single `data.frame` or `data.table` and apply your function to it? I'd do something like `library(data.table); rbindlist(lapply(your_csv_paths, fread), idcol="file")`. Also check this related [answer](https://stackoverflow.com/questions/72929492/what-is-the-fastest-way-to-import-many-csv-files-into-r). – ismirsehregal Sep 07 '22 at 13:52
  • @ismirsehregal Because I need to keep the data frames in lists for later purposes – Louise Sørensen Sep 07 '22 at 13:56
  • You can `split()` them up later. Applying a function to a single object will be faster. – ismirsehregal Sep 07 '22 at 13:56
  • That is something to sort out before using `rbindlist` - as part of the csv reading function. – ismirsehregal Sep 07 '22 at 14:03
  • @ismirsehregal that would've made sense if they didn't have completely different column names. I have data from different municipalities and I need to streamline the data so I in the end will be able to rbind them. But when I load the data I need to start off with a list – Louise Sørensen Sep 07 '22 at 14:04
  • Ok - hard to tell because this heterogeneity isn't reflected in the dummy data. However, the goal here seems to be to assign the name of the csv file to according data, right? You could simply use a named list for that. – ismirsehregal Sep 07 '22 at 14:11
  • @ismirsehregal ah sorry I can see how the sample data isn't realistic. But yes, that is the goal with the first function. But I'm looking for a smooth way to apply a function on multiple lists since I have some more functions I'll be doing this with – Louise Sørensen Sep 07 '22 at 14:31

2 Answers2

1

You could create a list of your lists, and then use lapply to apply to every list the function element.name. You can use setNames to avoid problems linked the assignment on names. You can then use list2env to get your data.frames back to the global environment.

setNames(list(list1, list2), c('list1', 'list2')) |>
  lapply(function(x) setNames(x, element.name(x))) |>
  list2env()

output

> list1
$df1
  x y
1 1 a
2 2 b
3 3 c

$df2
  x y
1 4 d
2 5 e
3 6 f

> list2
$df3
  x y
1 7 g
2 8 h
3 9 i

$df4
   x y
1 10 j
2 11 k
3 12 l
Maël
  • 45,206
  • 3
  • 29
  • 67
  • That works, thanks. But how do I change the names in data frames in the list in the environment? I'm not sure what to add as output from this – Louise Sørensen Sep 07 '22 at 13:48
  • You can use `list2env` for that purpose. – Maël Sep 07 '22 at 13:50
  • I want to keep the data frames in the lists, but I don't know what to put before the <- – Louise Sørensen Sep 07 '22 at 13:52
  • I'm not sure I understand. You can put what you want before <-, it will be the name of your list. – Maël Sep 07 '22 at 13:54
  • But that will create a list of my lists, how can I output just the lists? Will I need to put the list2env at the end everytime I do this? Because with only 4 lists I'm not sure whether it would just be faster to write the same code line 4 times each time – Louise Sørensen Sep 07 '22 at 13:58
  • list_of_lists <- lapply(list(list1, list2), function(x) setNames(x, element.name(x))) list2env(list_of_lists) this is what you mean right? – Louise Sørensen Sep 07 '22 at 13:59
  • Yes. See edit, that might be better suited to what you're looking for – Maël Sep 07 '22 at 14:03
  • Thanks, but it still isn't working for me. Nothing changes, I just get the output in the console: – Louise Sørensen Sep 07 '22 at 14:08
  • Yes, since your lists are now in the global environment, check your list list1 and list2 – Maël Sep 07 '22 at 14:09
0

Here is an approach using data.table::fread

library(data.table)

# create dummy CSVs -------------------------------------------------------
DT1 <- data.frame(x = 1:3, y = letters[1:3])
DT2 <- data.frame(x = 4:6, y = letters[4:6])
DT3 <- data.frame(x = 7:9, y = letters[7:9])
DT4 <- data.frame(x = 10:12, y = letters[10:12])

mapply(write.csv, x = list(DT1, DT2, DT3, DT4), file = list("DT1.csv", "DT2.csv", "DT3.csv", "DT4.csv"), row.names = FALSE)

# read in CSVs ------------------------------------------------------------
csv_paths <- list.files(path = ".", pattern = ".csv$")

# might need to split this into different steps due to different csv formats?
DT_list <- setNames(lapply(csv_paths, fread), tools::file_path_sans_ext(basename(csv_paths)))

# apply a function to each data.table -------------------------------------
lapply(DT_list, function(DT){DT[, test := x*2]})

If you want to stick with the given dummy data just merge the lists:

list1 <- list(df1,df2)
list2 <- list(df3,df4)
DT_list <- setNames(c(list1, list2), tools::file_path_sans_ext(basename(csv_paths)))
ismirsehregal
  • 30,045
  • 5
  • 31
  • 78