2

I am trying to use tidyr to unpivot a time series dataset that has many variables running across the rows at the top of the file, one variable running down the first column, and observations beginning at some point in the dataset. I can't work out how to achieve this using pivot_longer or pivot_wider. Is this possible?

Here is a miniature example of my raw data, with the actual observations beginning at row 4 column 2.

         A          B          C
 varname_1 category_1 category_1
 varname_2 category_2 category_2
 varname_3 category_3 category_4
      1990        100        200
      1991        101        201
      1992        102        202

I want it in a long format so my result would be:

  varname_1  varname_2  varname_3 Year Obs
 category_1 category_2 category_3 1990 100
 category_1 category_2 category_3 1991 101
 category_1 category_2 category_3 1992 102
 category_1 category_2 category_4 1990 200
 category_1 category_2 category_4 1991 201
 category_1 category_2 category_4 1992 202

Script to generate my raw data:

my_table = data.frame(
A=c("varname_1","varname_2","varname_3",1990,1991,1992),
B=c("category_1","category_2","category_3",100,101,102),
C=c("category_1","category_2","category_4",200,201,202))

and my desired result:

my_result = data.frame(
varname_1=c("category_1","category_1","category_1","category_1","category_1","category_1"),
varname_2=c("category_2","category_2","category_2","category_2","category_2","category_2"),
varname_3=c("category_3","category_4","category_3","category_4","category_3","category_4"),
Year=c(1990,1991,1992,1990,1991,1992),Obs=c(100,101,102,200,201,202))
thelatemail
  • 91,185
  • 12
  • 128
  • 188
buzz
  • 35
  • 4
  • It'd be helpful if you could format your data in such a way that others could copy/paste easily. See [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on how to make a great reproducible example. – Desmond Jul 28 '21 at 02:01
  • gah! The markdown formatting won't carry across from the question editor for some reason. Have added r-script to generate my example tables. – buzz Jul 28 '21 at 02:20

1 Answers1

3

The data is in a different format than the expected input format in pivot_longer - i.e. column types are mixed etc..

An option is to subset the data rows (4:6) that corresponds to the 'Year', 'Obs', reshape it to 'long' format with pivot_longer, remove the 'name' column (select), and reorder the rows of the data by arrangeing on the sequence of similar values in 'A' (rowid), then bind the data to the transposed first 3 rows and replicated rows by 3 (uncount). Update the type of the columns with type.convert

library(dplyr)
library(tidyr)
library(data.table)
df1 %>% 
    slice_tail(n = 3) %>%
    pivot_longer(cols = -A, values_to = 'Obs') %>%
    select(-name) %>%
    arrange(rowid(A)) %>%
    bind_cols(data.table::transpose(df1 %>% 
                     slice_head(n = 3), make.names = 'A') %>% 
    uncount(3), .) %>%
    rename(Year = A) %>%
    type.convert(as.is = TRUE)

-output

   varname_1  varname_2  varname_3 Year Obs
1 category_1 category_2 category_3 1990 100
2 category_1 category_2 category_3 1991 101
3 category_1 category_2 category_3 1992 102
4 category_1 category_2 category_4 1990 200
5 category_1 category_2 category_4 1991 201
6 category_1 category_2 category_4 1992 202

data

df1 <- structure(list(A = c("varname_1", "varname_2", "varname_3", "1990", 
"1991", "1992"), B = c("category_1", "category_2", "category_3", 
"100", "101", "102"), C = c("category_1", "category_2", "category_4", 
"200", "201", "202")), class = "data.frame", row.names = c(NA, 
-6L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • @buzz I added some description – akrun Jul 28 '21 at 03:35
  • 1
    note `tail(...)` and `head(...)` might be correct syntax for the `slice` functions for some users – buzz Jul 28 '21 at 04:34
  • 1
    @buzz. `slice` is more general as `tail` or `head` `base R` functions cannot work with `group` attributes i.e. head or tail will just extract those subsets without obeying the group attributes, whereas `slice` will get the head/tail of group attributes – akrun Jul 28 '21 at 04:36
  • 1
    In this case, without grouping, `tail` or `head` can be used. But my suggestion was a general case scenario – akrun Jul 28 '21 at 04:37