0

I have some code that reads all the xlsx from a directory, imports it into RStudio as a list, and names each element of the list with the file name. Each element is stored as a dataframe.

I'm new to R, but what is the most sensible way of applying a set of functions over each element of the list? Each dataframe is identical in layout.

I want to filter to a specific area, group by ages, and then extract this information as a new dataframe (bind the rows).

..$ Persons            :'data.frame':   1932 obs. of  36 variables:
  .. ..$ gss_code_borough: chr [1:1932]  ...
  .. ..$ gss_code_ward   : chr [1:1932]  ...
  .. ..$ district        : chr [1:1932]  ...
  .. ..$ ward_name       : chr [1:1932] ...
  .. ..$ age             : chr [1:1932] "total" "0" "1" "2" ...
  .. ..$ 2011            : num [1:1932] 261590 4779 4480 4320 4197 ...
  .. ..$ 2012            : num [1:1932] 263856 4723 4571 4390 4082 ...

The above shows the layout of the first element of the list. I want to filter all tables by a specific area, break down to specific age ranges and sum. I can write the code 6 times, changing the list element each time, but there must be a quicker way?

  • 1
    Thank you for the description, but it would be easier for others to help you if you create a [reproducible example](https://stackoverflow.com/q/5963269/2572423). Perhaps create a simple "fake" 6-element list that looks/feels similar to what you are encountering -- you can also provide the expected output. Something like: `my_list <- list(df1 = data.frame(x1 = 1:3, x2 = 4:6), df2 = data.frame(x1 = 4:6, x2 = 7:9), ...)` – JasonAizkalns Apr 04 '19 at 14:23

1 Answers1

0

Let's suppose you have a list called dta which lots of data.frame with the structure you have given. You might need purrr package that will help this a lot.

library(purrr)
map_df(dta, ~.x %>% filter(district == "a1", age == "2"), .id = "dataset")
TheRimalaya
  • 4,232
  • 2
  • 31
  • 37