0

I have a list of nested data frames and I want to extract the observations of the earliest year, my problem is the first year change with the data frames. the year is either 1992 or 2005.

I want to create a list to stock them, I tried with which, but since there is the same year, observations are repeated, and I want them apart

new_df<- which(df[[i]]==1992 | df[[i]]==2005)

I've tried with ifelse() but I have to do an lm operation after, and it doesn't work. And I can't take only the first rows, because the year are repeated

my code looks like this:

df<- list(a<-data.frame(a_1<-(1992:2015),
                      a_2<-sample(1:24)),
        b<-data.frame(b_1<-(1992:2015),
                      b_2<-sample(1:24)),
        c<-data.frame(c_1<-(2005:2015),
                      c_2<-sample(1:11)),
        d<-data.frame(d_1<-(2005:2015),
                      d_2<-sample(1:11)))
Quinten
  • 35,235
  • 5
  • 20
  • 53
marine
  • 1
  • 3
  • 1
    Did you create the dataframes or were they given to you like this? They look a little unusual to me. – bird Aug 26 '22 at 08:50

3 Answers3

0

You can define a function to get the data on one data.frame and loop on the list to extract values.

Below I use map from the purrr package but you can also use lapply and for loops

Please do not use <- when assigning values in a function call (here data.frame() ) because it will mess colnames. = is used in function calls for arguments variables and it's okay to use it. You can read this ;)

df<- list(a<-data.frame(a_1 = (1992:2015),
                        a_2 = sample(1:24)),
          b<-data.frame(b_1 = (1992:2015),
                        b_2 = sample(1:24)),
          c<-data.frame(c_1 = (2005:2015),
                        c_2 = sample(1:11)),
          d<-data.frame(d_1 = (2005:2015),
                        d_2 = sample(1:11)))

extract_miny <- function(df){
    miny <- min(df[,1])
    res <- df[df[,1] == miny, 2]
    names(res) <- miny
    return(res)
}

map(df, extract_miny)
Gowachin
  • 1,251
  • 2
  • 9
  • 17
0

If the data is sorted as the example, you can slice() the first row for the information. Notice the use of = rather than <- in creating a nested dataframe.

library(tidyverse)

df <- list(
  a = data.frame(a_1 = (1992:2015),
                 a_2 = sample(1:24)),
  b = data.frame(b_1 = (1992:2015),
                 b_2 = sample(1:24)),
  c = data.frame(c_1 = (2005:2015),
                 c_2 = sample(1:11)),
  d = data.frame(d_1 = (2005:2015),
                 d_2 = sample(1:11))
)

df %>%
  imap_dfr( ~ slice(.x, 1) %>%
              set_names(c("year", "value")) %>%
              mutate(dataframe = .y) %>%
              as_tibble())

# A tibble: 4 x 3
   year value dataframe
  <int> <int> <chr>    
1  1992    19 a        
2  1992     2 b        
3  2005     1 c        
4  2005     5 d       
Chamkrai
  • 5,912
  • 1
  • 4
  • 14
0

You may subset anonymeously.

lapply(df, \(x) setNames(x[x[[1]] == min(x[[1]]), ], c('year', 'value'))) |> do.call(what=rbind)
#   year value
# 1 1992     6
# 2 1992     9
# 3 2005    11
# 4 2005    11

Or maybe better by creating a variable from which sample the value stems from.

Map(`[<-`, df, 'sample', value=letters[seq_along(df)]) |>
  lapply(\(x) setNames(x[x[[1]] == min(x[[1]]), ], c('year', 'value', 'sample'))) |> 
  do.call(what=rbind)
#   year value sample
# 1 1992     6      a
# 2 1992     9      b
# 3 2005    11      c
# 4 2005    11      d

Data:

df <- list(structure(list(a_1.....1992.2015. = 1992:2015, a_2....sample.1.24. = c(6L, 
18L, 23L, 5L, 7L, 14L, 4L, 10L, 19L, 17L, 15L, 1L, 11L, 22L, 
13L, 8L, 20L, 16L, 2L, 3L, 24L, 21L, 9L, 12L)), class = "data.frame", row.names = c(NA, 
-24L)), structure(list(b_1.....1992.2015. = 1992:2015, b_2....sample.1.24. = c(9L, 
24L, 18L, 8L, 16L, 11L, 13L, 23L, 15L, 20L, 19L, 21L, 12L, 22L, 
7L, 3L, 6L, 17L, 2L, 5L, 4L, 10L, 1L, 14L)), class = "data.frame", row.names = c(NA, 
-24L)), structure(list(c_1.....2005.2015. = 2005:2015, c_2....sample.1.11. = c(11L, 
2L, 5L, 10L, 9L, 6L, 1L, 7L, 3L, 8L, 4L)), class = "data.frame", row.names = c(NA, 
-11L)), structure(list(d_1.....2005.2015. = 2005:2015, d_2....sample.1.11. = c(11L, 
2L, 5L, 1L, 6L, 9L, 3L, 7L, 10L, 4L, 8L)), class = "data.frame", row.names = c(NA, 
-11L)))
jay.sf
  • 60,139
  • 8
  • 53
  • 110