1

example data

metro_2005_1 <- data.frame(col1 = 1:5, col2 = 6:10)
metro_2006_1 <- data.frame(col1 = 1:3, col2 = 4:6)

I have 20 dataframes, each named in the following format where x is a number 1-9:

metro_20XX_X

I am trying to extract the middle section into a new column, and wrote a function that works when applied on each dataframe individually called addYear.

addYear <- function(metro){
   metro_name <- deparse(substitute(metro))
   metro <- metro %>% mutate(Year = substr(metro_name,7,10))
   return(metro)
   }

example <- addYear(metro_2005_1)

str(example)

'data.frame':   5 obs. of  3 variables:
  $ col1: int  1 2 3 4 5
  $ col2: int  6 7 8 9 10
  $ Year: chr  "2005" "2005" "2005" "2005" 

I added all 20 of my dataframes into a list called metro_append_year, and tried to apply my addYear function to all 20 of the dataframes using lapply. However, when I inspect "result" the year column is created in each of my dataframes but empty.

metro_append_year <- list(metro_2005_1, metro_2006_1)

result <- lapply(metro_append_year,addYear)

str(result[[1]])
'data.frame':   5 obs. of  3 variables:
 $ col1: int  1 2 3 4 5
 $ col2: int  6 7 8 9 10
 $ Year: chr  "" "" "" ""
anothermh
  • 9,815
  • 3
  • 33
  • 52
  • 2
    Welcome to Stack Overflow! Please provide a [reproducible example in r](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The link I provided, will tell you how. Moreover, please take the [tour](https://stackoverflow.com/tour) and visit [how to ask](https://stackoverflow.com/help/how-to-ask). Cheers. – M-- Jan 02 '19 at 20:29
  • You are checking one individual data frame (not all). Try `lapply(result, str)` and tell us if *Year* situation occurs across all dfs. – Parfait Jan 02 '19 at 20:38
  • I edited the post to include a reproducible example, borrowing from akrun's answer for the example data. – Alex Talbott Jan 02 '19 at 21:06
  • I checked and the year is missing across all dfs. – Alex Talbott Jan 02 '19 at 21:07

2 Answers2

0

We could pass the 'data' and the name of the list element as two arguments. Now, it becomes easier

addYear <- function(data, name){

   data %>% 
          mutate(Year = substr(name,7,10))

 }
lapply(names(metro_append_year), function(nm) addYear(metro_append_year[[nm]], nm))

data

metro_2005_1 <- data.frame(col1 = 1:5, col2 = 6:10)
metro_2006_1 <- data.frame(col1 = 1:3, col2 = 4:6)
metro_append_year <- mget(ls(pattern = '^metro_\\d{4}'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you, this worked. I am still researching as to why. So far it looks like I needed to use lapply in the function(nm) format in order to reference the names of the list elements. From [Advanced R](http://adv-r.had.co.nz/Functionals.html): _three basic ways to use lapply(): lapply(xs, function(x) {}) lapply(seq_along(xs), function(i) {}) lapply(names(xs), function(nm) {}) Typically you’d use the first form because lapply() takes care of saving the output for you. However, if you need to know position or name of the element you’re working with, you should use the 2nd or 3rd form_ – Alex Talbott Jan 02 '19 at 21:17
  • @AlexTalbott There are issues in getting the individual object. In your post, the single argument evaluate the object as well as extracts the substring. Here, it is in a `list` of data.frames. So, indexing the list to extract the object is the easiest way instead of doing some complicated steps – akrun Jan 03 '19 at 05:00
0

Since you are a R newbie, consider a base R solution which can extract a list of objects with mget and iterate elementwise with Map (wrapper to mapply) through list names and corresponding values. Possibly the passing of names for unquoted column aliases is the issue with your dplyr call.

The within or transform functions mirrors dplyr::mutate where you can assign column(s) in place to return the object:

# ALL METRO DATA FRAMES
metro_dfs <- mget(ls(pattern="metro"))

metro_dfs <- Map(function(name, df) within(df, Year <- substr(name,7,10))),
                 names(metro_dfs), metro_dfs)

Alternatively:

metro_dfs <- mapply(function(name, df) transform(df, Year = substr(name,7,10))),
                    names(metro_dfs), metro_dfs, SIMPLIFY=FALSE)
Parfait
  • 104,375
  • 17
  • 94
  • 125