0

I have a list of dataframes that look like this>


crops_1990.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(1,2,3),
                                year=NA)

crops_1991.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(4,5,6),
                                year=NA)

crops_1992.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(7,8,9),
                                year=NA)


df_list <- list(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor)

I would like to fill the column 'year' with the year information that is in the name of each df within the list (1990, 1991 and 1992, respectively in this example).

I thought it would be very easy but I'm struggling a lot!

I've tried stuff like:

df_list <- lapply(df_list, function(x) {x$year <- as.character(x$year); x}) 
 
df_list <- lapply(df_list, function(x) {x$year <- substring(names(df_list), 7,10); x}) # add years from object name in list

but nothing seems to work. My expected result would be the dataframes within the list looking like this:


crops_1990.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(1,2,3),
                                year=c("1990", "1990", "1990"))

crops_1991.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(4,5,6),
                                year=c("1991", "1991", "1991"))

crops_1992.tempor <- data.frame(study_unit=c("unit1", "unit2", "unit3"),
                                cropp=c("crop1", "crop2", "crop3"),
                                area=c(7,8,9),
                                year=c("1992", "1992", "1992"))

2 Answers2

2

Using tidyverse (lst names the list automatically*) you could do:

library(tidyverse)

lst(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor) |>
  imap(~ .x |> mutate(year = .y |> str_extract("\\d+")))

Alternatively, you could put all of the objects of your environment containing crops_ into a list using mget and ls (faster if you have many data frames!):

mget(ls(pattern = "crops_")) |>
  imap(~ .x |> mutate(year = .y |> str_extract("\\d+")))

Output:

$crops_1990.tempor
  study_unit cropp area year
1      unit1 crop1    1 1990
2      unit2 crop2    2 1990
3      unit3 crop3    3 1990

$crops_1991.tempor
  study_unit cropp area year
1      unit1 crop1    4 1991
2      unit2 crop2    5 1991
3      unit3 crop3    6 1991

$crops_1992.tempor
  study_unit cropp area year
1      unit1 crop1    7 1992
2      unit2 crop2    8 1992
3      unit3 crop3    9 1992

NB! You should consider to putting your data into a list in the first place when you load your data. See e.g. on why: How do I make a list of data frames?

(*) One of the reasons why your approach isn't working is that the list is not named.

harre
  • 7,081
  • 2
  • 16
  • 28
  • 1
    Thanks! I used the first solution, because I have other objects with different name patterns in my environment, and it works perfectly. My actual data is indeed a named list which I called as so from the beginning. – Diego Brizuela Jul 05 '22 at 15:19
1

Another potential way is:

## Creating list of dataframes
df_list <- list(crops_1990.tempor, crops_1991.tempor, crops_1992.tempor)

## Getting the name of all dataframes stored in R's global environment
names_of_dataframes <- ls.str(mode = "list")

## Inserting the values in Year column
for (i in 1:length(names(which(unlist(eapply(.GlobalEnv,is.data.frame)))))) {
    df_list[[i]]$year = as.numeric(str_extract_all(names(which(unlist(eapply(.GlobalEnv,is.data.frame))))[i], "[0-9]+"))
}

## Unlisting all dataframes from the df_list
for (i in seq(df_list))
      assign(names(which(unlist(eapply(.GlobalEnv,is.data.frame))))[i], df_list[[i]])

Output

> crops_1990.tempor
  study_unit cropp area year
1      unit1 crop1    1 1990
2      unit2 crop2    2 1990
3      unit3 crop3    3 1990
> crops_1991.tempor
  study_unit cropp area year
1      unit1 crop1    7 1991
2      unit2 crop2    8 1991
3      unit3 crop3    9 1991
> crops_1992.tempor
  study_unit cropp area year
1      unit1 crop1    4 1992
2      unit2 crop2    5 1992
3      unit3 crop3    6 1992
Deepansh Arora
  • 724
  • 1
  • 3
  • 15
  • 1
    I liked your solution. Specially the fact that you don't have to specify all the data frames. Good Job! – kav Jul 06 '22 at 04:25