0

I have 4 data frames of climate data. They are formatted such that column 1 is the year, and columns 2:13 are the months and its average of the climate variable of interest.

What I want (to input into another function that needs this format) is a df where column 1 is year, column 2 is month, column 3 is the corresponding climate data from df1, column 4 is the the corresponding climate data from df2 etc.

so I have is

Year Jan Feb
1991 33.3 30.1
1992 32.1. 29.7

x 4

and what I want is

Year. Month Tmax Tmin
1991 Jan 33.3 22.1
1991 Feb 30.1 20.1

This is different from this question because I wish to reshape and combine multiple data frames. It is probably valuable because this format of climate data is common, and we commonly want combine multiple climate variables into one table, as I am trying to do, so the specifics of combining is relevant.

I can do it using the following for loop and remaking the data frame as below, but I am trying to wean myself off for loops as I work with larger datasets. What is a more efficient way to reshape this data?

rainvec <- c()
 for (i in 42:nrow(rain)) {
   for (j in 2:13) {
     rainvec <- c(rainvec, rain[[i,j]])
   }
  }

tminvec <- c()
 for (i in 1:nrow(tmin)) {
   for (j in 2:13) {
     tminvec <- c(tminvec, tmin[[i,j]])
   }
  }              
                    
tmaxvec <- c()
 for (i in 1:nrow(tmax)) {
   for (j in 2:13) {
     tmaxvec <- c(tmaxvec, tmax[[i,j]])
   }
  }                     
      
tmeanvec <- c()
 for (i in 1:nrow(tmean)) {
   for (j in 2:13) {
     tmeanvec <- c(tmeanvec, tmean[[i,j]])
   }
  }  
     
years <- sort(rep(tmean[[1]], 12))

months <- rep(1:12, nrow(tmin))
clim <- data.frame(years, months, rainvec, tminvec, tmaxvec, tmeanvec)
The_Tams
  • 205
  • 1
  • 10
  • 1
    Try to use ```tidyr::pivot_longer``` to get the long format and merge all four dataframe by year and month – one Mar 22 '23 at 22:09

4 Answers4

2

This might be easiest if you reshape your data from "wide" to "long", and then extract stats per group.

library(tidyverse)
df %>%
  pivot_longer(-Year, names_to = "Month") %>%
  # this is to put the month abbreviations in chronological order
  mutate(Month = factor(Month, levels = month.abb)) %>%
  group_by(Year, Month) %>%
  summarize(Tmax = max(value), Tmin = min(value), .groups = "drop")

Result

# A tibble: 6 × 4
   Year Month  Tmax  Tmin
  <int> <fct> <dbl> <dbl>
1  1991 Jan    34    33.3
2  1991 Feb    30.1  28  
3  1991 Mar    31    30  
4  1992 Jan    32.1  32.1
5  1992 Feb    29.7  29.7
6  1992 Mar    35    35 

Some sample data with multiple Jan 1991 rows:

df <- data.frame(
  stringsAsFactors = FALSE,
              Year = c(1991L, 1991L, 1992L),
               Jan = c(33.3, 34, 32.1),
               Feb = c(30.1, 28, 29.7),
               Mar = c(31, 30, 35)
      )

> df
  Year  Jan  Feb Mar
1 1991 33.3 30.1  31
2 1991 34.0 28.0  30
3 1992 32.1 29.7  35
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
2

You could use pivot_longer however I (as a R fundamentalist) would advise against it, just because there really is no need to use it, because you can do it very easy and clean in baseR .

# create dummy data
df<- cbind(
  Year=1991:2000,
  as.data.frame(sapply(month.name,\(x) rbind(rnorm(10,12,5))))
  )

# pivot table to long
df_long <- reshape(df,direction = "long",
        idvar="Year",
        timevar = "Month",
        times = names(df[-1]),
        varying = list(2:13),
        v.names = "Mean"
        )

# to remove row names
row.names(df_long)<- NULL
# View
View(df_long)

Pros:
you don't have to load a package and baseR ensures great backwards compatibility.

Cons:
tidyverse is heavily used nowadays- therefore colleagues may not know what you are doing when you share them your code.

user12256545
  • 2,755
  • 4
  • 14
  • 28
  • Just running your example works perfectly, but when I change the variables in the code (and I tried a few ways) I seem to have broken it, and I am not sure why. I received 36 warnings "Warning messages: 1: In value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij : number of items to replace is not a multiple of replacement length" Are there common errors newbs should look out for with reshape? – The_Tams Mar 23 '23 at 10:24
1

are you looking for pivot_longer ?

library(tidyr)
df=data.frame(year=c(1991,1992),Jan=c(33.3,32.1),Feb=c(30.1,29.7))
df%>%
  pivot_longer(-year,names_to = "month",values_to = "average")

# A tibble: 4 × 3
   year month average
  <dbl> <chr>   <dbl>
1  1991 Jan      33.3
2  1991 Feb      30.1
3  1992 Jan      32.1
4  1992 Feb      29.7
Wael
  • 1,640
  • 1
  • 9
  • 20
1

First put your data frames in a list

df_list <- list(df, df, df, df)

names(df_list) <- LETTERS[1:4] # yours might already have names...

Then pivot_longer all data frames with lapply and left_join all with reduce from purrr. Lastly rename the columns.

library(dplyr)
library(purrr)
library(tidyr)

lapply(df_list, function(x) pivot_longer(x, -Year)) %>% 
  reduce(left_join, by = c("Year", "name")) %>% 
  rename_with(function(x) c(x[1], sub("name", "Month", x[2]), names(df_list)))
# A tibble: 4 × 6
   Year Month     A     B     C     D
  <int> <chr> <dbl> <dbl> <dbl> <dbl>
1  1991 Jan    33.3  33.3  33.3  33.3
2  1991 Feb    30.1  30.1  30.1  30.1
3  1992 Jan    32.1  32.1  32.1  32.1
4  1992 Feb    29.7  29.7  29.7  29.7

Data

df <- structure(list(Year = 1991:1992, Jan = c(33.3, 32.1), Feb = c(30.1, 
29.7)), class = "data.frame", row.names = c(NA, -2L))

df
  Year  Jan  Feb
1 1991 33.3 30.1
2 1992 32.1 29.7
Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
  • 1
    I particularly like this answer because a) it worked first time on my data, and b) it let me bring all of the data frames together very simply. For this reason, this answer is even better than that given in the associated question, which this is now considered a duplicate of. – – The_Tams Mar 23 '23 at 17:22
  • I also think it's just related, not a clear duplicate because you wanted a pivot plus a join... – Andre Wildberg Mar 23 '23 at 17:37