R Reformat (without for loop) and combine four dfs col1 year, col2:13 months into df col1 year, col2 month, col 3 climate var

Question

I have 4 data frames of climate data. They are formatted such that column 1 is the year, and columns 2:13 are the months and its average of the climate variable of interest.

What I want (to input into another function that needs this format) is a df where column 1 is year, column 2 is month, column 3 is the corresponding climate data from df1, column 4 is the the corresponding climate data from df2 etc.

so I have is

Year	Jan	Feb
1991	33.3	30.1
1992	32.1.	29.7

x 4

and what I want is

Year.	Month	Tmax	Tmin
1991	Jan	33.3	22.1
1991	Feb	30.1	20.1

This is different from this question because I wish to reshape and combine multiple data frames. It is probably valuable because this format of climate data is common, and we commonly want combine multiple climate variables into one table, as I am trying to do, so the specifics of combining is relevant.

I can do it using the following for loop and remaking the data frame as below, but I am trying to wean myself off for loops as I work with larger datasets. What is a more efficient way to reshape this data?

rainvec <- c()
 for (i in 42:nrow(rain)) {
   for (j in 2:13) {
     rainvec <- c(rainvec, rain[[i,j]])
   }
  }

tminvec <- c()
 for (i in 1:nrow(tmin)) {
   for (j in 2:13) {
     tminvec <- c(tminvec, tmin[[i,j]])
   }
  }              
                    
tmaxvec <- c()
 for (i in 1:nrow(tmax)) {
   for (j in 2:13) {
     tmaxvec <- c(tmaxvec, tmax[[i,j]])
   }
  }                     
      
tmeanvec <- c()
 for (i in 1:nrow(tmean)) {
   for (j in 2:13) {
     tmeanvec <- c(tmeanvec, tmean[[i,j]])
   }
  }  
     
years <- sort(rep(tmean[[1]], 12))

months <- rep(1:12, nrow(tmin))
clim <- data.frame(years, months, rainvec, tminvec, tmaxvec, tmeanvec)

Try to use ```tidyr::pivot_longer``` to get the long format and merge all four dataframe by year and month — one, Mar 22 '23 at 22:09

score 2 · Answer 1 · answered Mar 22 '23 at 22:08

This might be easiest if you reshape your data from "wide" to "long", and then extract stats per group.

library(tidyverse)
df %>%
  pivot_longer(-Year, names_to = "Month") %>%
  # this is to put the month abbreviations in chronological order
  mutate(Month = factor(Month, levels = month.abb)) %>%
  group_by(Year, Month) %>%
  summarize(Tmax = max(value), Tmin = min(value), .groups = "drop")

Result

# A tibble: 6 × 4
   Year Month  Tmax  Tmin
  <int> <fct> <dbl> <dbl>
1  1991 Jan    34    33.3
2  1991 Feb    30.1  28  
3  1991 Mar    31    30  
4  1992 Jan    32.1  32.1
5  1992 Feb    29.7  29.7
6  1992 Mar    35    35

Some sample data with multiple Jan 1991 rows:

df <- data.frame(
  stringsAsFactors = FALSE,
              Year = c(1991L, 1991L, 1992L),
               Jan = c(33.3, 34, 32.1),
               Feb = c(30.1, 28, 29.7),
               Mar = c(31, 30, 35)
      )

> df
  Year  Jan  Feb Mar
1 1991 33.3 30.1  31
2 1991 34.0 28.0  30
3 1992 32.1 29.7  35

score 2 · Answer 2 · answered Mar 22 '23 at 22:47

You could use pivot_longer however I (as a R fundamentalist) would advise against it, just because there really is no need to use it, because you can do it very easy and clean in baseR .

# create dummy data
df<- cbind(
  Year=1991:2000,
  as.data.frame(sapply(month.name,\(x) rbind(rnorm(10,12,5))))
  )

# pivot table to long
df_long <- reshape(df,direction = "long",
        idvar="Year",
        timevar = "Month",
        times = names(df[-1]),
        varying = list(2:13),
        v.names = "Mean"
        )

# to remove row names
row.names(df_long)<- NULL
# View
View(df_long)

Pros:
you don't have to load a package and baseR ensures great backwards compatibility.

Cons:
tidyverse is heavily used nowadays- therefore colleagues may not know what you are doing when you share them your code.

Just running your example works perfectly, but when I change the variables in the code (and I tried a few ways) I seem to have broken it, and I am not sure why. I received 36 warnings "Warning messages: 1: In value[[jj]][ri] <- if (is.factor(xij)) as.vector(xij) else xij : number of items to replace is not a multiple of replacement length" Are there common errors newbs should look out for with reshape? — The_Tams, Mar 23 '23 at 10:24

Wael · Answer 3 · 2023-03-22T22:15:16.497

1

are you looking for pivot_longer ?

library(tidyr)
df=data.frame(year=c(1991,1992),Jan=c(33.3,32.1),Feb=c(30.1,29.7))
df%>%
  pivot_longer(-year,names_to = "month",values_to = "average")

# A tibble: 4 × 3
   year month average
  <dbl> <chr>   <dbl>
1  1991 Jan      33.3
2  1991 Feb      30.1
3  1992 Jan      32.1
4  1992 Feb      29.7

edited Mar 22 '23 at 22:15

answered Mar 22 '23 at 22:08

Wael

1,640
1
9
20

Andre Wildberg · Accepted Answer · 2023-03-22T23:28:59.260

First put your data frames in a list

df_list <- list(df, df, df, df)

names(df_list) <- LETTERS[1:4] # yours might already have names...

Then pivot_longer all data frames with lapply and left_join all with reduce from purrr. Lastly rename the columns.

library(dplyr)
library(purrr)
library(tidyr)

lapply(df_list, function(x) pivot_longer(x, -Year)) %>% 
  reduce(left_join, by = c("Year", "name")) %>% 
  rename_with(function(x) c(x[1], sub("name", "Month", x[2]), names(df_list)))
# A tibble: 4 × 6
   Year Month     A     B     C     D
  <int> <chr> <dbl> <dbl> <dbl> <dbl>
1  1991 Jan    33.3  33.3  33.3  33.3
2  1991 Feb    30.1  30.1  30.1  30.1
3  1992 Jan    32.1  32.1  32.1  32.1
4  1992 Feb    29.7  29.7  29.7  29.7

Data

df <- structure(list(Year = 1991:1992, Jan = c(33.3, 32.1), Feb = c(30.1, 
29.7)), class = "data.frame", row.names = c(NA, -2L))

df
  Year  Jan  Feb
1 1991 33.3 30.1
2 1992 32.1 29.7

I particularly like this answer because a) it worked first time on my data, and b) it let me bring all of the data frames together very simply. For this reason, this answer is even better than that given in the associated question, which this is now considered a duplicate of. – — The_Tams, Mar 23 '23 at 17:22
I also think it's just related, not a clear duplicate because you wanted a pivot plus a join... — Andre Wildberg, Mar 23 '23 at 17:37

R Reformat (without for loop) and combine four dfs col1 year, col2:13 months into df col1 year, col2 month, col 3 climate var

4 Answers4

Data