18

Happy Weekends.

I've been trying to replicate the results from this blog post in R. I am looking for a method of transposing the data without using t, preferably using tidyr or reshape. In example below, metadata is obtained by transposing data.

metadata <- data.frame(colnames(data), t(data[1:4, ]) )
colnames(metadata) <- t(metadata[1,])
metadata <- metadata[-1,]
metadata$Multiplier <- as.numeric(metadata$Multiplier)

Though it achieves what I want, I find it little unskillful. Is there any efficient workflow to transpose the data frame?

data

data <- data.frame(
  Series.Description = c("Unit:", "Multiplier:", "Currency:", "Unique Identifier: "),
  Nominal.Broad.Dollar.Index. = c("Index:_1997_Jan_100", "1", NA, "H10/H10/JRXWTFB_N.M"),
  Nominal.Major.Currencies.Dollar.Index. = c("Index:_1973_Mar_100", "1", NA, "H10/H10/JRXWTFN_N.M"),
  Nominal.Other.Important.Trading.Partners.Dollar.Index. = c("Index:_1997_Jan_100", "1", NA, "H10/H10/JRXWTFO_N.M"),
  AUSTRALIA....SPOT.EXCHANGE.RATE..US..AUSTRALIAN...RECIPROCAL.OF.RXI_N.M.AL. = c("Currency:_Per_AUD", "1", "USD", "H10/H10/RXI$US_N.M.AL"),
  SPOT.EXCHANGE.RATE...EURO.AREA. = c("Currency:_Per_EUR", "1", "USD", "H10/H10/RXI$US_N.M.EU"),
  NEW.ZEALAND....SPOT.EXCHANGE.RATE..US..NZ...RECIPROCAL.OF.RXI_N.M.NZ.. = c("Currency:_Per_NZD", "1", "USD", "H10/H10/RXI$US_N.M.NZ"),
  United.Kingdom....Spot.Exchange.Rate..US..Pound.Sterling.Reciprocal.of.rxi_n.m.uk = c("Currency:_Per_GBP", "0.01", "USD", "H10/H10/RXI$US_N.M.UK"),
  BRAZIL....SPOT.EXCHANGE.RATE..REAIS.US.. = c("Currency:_Per_USD", "1", "BRL", "H10/H10/RXI_N.M.BZ"),
  CANADA....SPOT.EXCHANGE.RATE..CANADIAN...US.. = c("Currency:_Per_USD", "1", "CAD", "H10/H10/RXI_N.M.CA"),
  CHINA....SPOT.EXCHANGE.RATE..YUAN.US.. = c("Currency:_Per_USD", "1", "CNY", "H10/H10/RXI_N.M.CH"),
  DENMARK....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD", "1", "DKK", "H10/H10/RXI_N.M.DN"),
  HONG.KONG....SPOT.EXCHANGE.RATE..HK..US.. = c("Currency:_Per_USD", "1", "HKD", "H10/H10/RXI_N.M.HK"),
  INDIA....SPOT.EXCHANGE.RATE..RUPEES.US. = c("Currency:_Per_USD", "1", "INR", "H10/H10/RXI_N.M.IN"),
  JAPAN....SPOT.EXCHANGE.RATE..YEA.US.. = c("Currency:_Per_USD", "1", "JPY", "H10/H10/RXI_N.M.JA"),
  KOREA....SPOT.EXCHANGE.RATE..WON.US.. = c("Currency:_Per_USD", "1", "KRW", "H10/H10/RXI_N.M.KO"),
  Malaysia...Spot.Exchange.Rate..Ringgit.US.. = c("Currency:_Per_USD", "1", "MYR", "H10/H10/RXI_N.M.MA"),
  MEXICO....SPOT.EXCHANGE.RATE..PESOS.US.. = c("Currency:_Per_USD", "1", "MXN", "H10/H10/RXI_N.M.MX"),
  NORWAY....SPOT.EXCHANGE.RATE..KRONER.US.. = c("Currency:_Per_USD", "1", "NOK", "H10/H10/RXI_N.M.NO"),
  SWEDEN....SPOT.EXCHANGE.RATE..KRONOR.US.. = c("Currency:_Per_USD", "1", "SEK", "H10/H10/RXI_N.M.SD"),
  SOUTH.AFRICA....SPOT.EXCHANGE.RATE..RAND.US.. = c("Currency:_Per_USD", "1", "ZAR", "H10/H10/RXI_N.M.SF"),
  Singapore...SPOT.EXCHANGE.RATE..SINGAPORE...US.. = c("Currency:_Per_USD", "1", "SGD", "H10/H10/RXI_N.M.SI"),
  SRI.LANKA....SPOT.EXCHANGE.RATE..RUPEES.US.. = c("Currency:_Per_USD", "1", "LKR", "H10/H10/RXI_N.M.SL"),
  SWITZERLAND....SPOT.EXCHANGE.RATE..FRANCS.US.. = c("Currency:_Per_USD", "1", "CHF", "H10/H10/RXI_N.M.SZ"),
  TAIWAN....SPOT.EXCHANGE.RATE..NT..US.. = c("Currency:_Per_USD", "1", "TWD", "H10/H10/RXI_N.M.TA"),
  THAILAND....SPOT.EXCHANGE.RATE....THAILAND. = c("Currency:_Per_USD", "1", "THB", "H10/H10/RXI_N.M.TH"),
  VENEZUELA....SPOT.EXCHANGE.RATE..BOLIVARES.US.. = c("Currency:_Per_USD", "1", "VEB", "H10/H10/RXI_N.M.VE")
)
moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
ExperimenteR
  • 4,453
  • 1
  • 15
  • 19

2 Answers2

40

Using tidyr, you gather all the columns except the first, and then you spread the gathered columns.

Try:

library(dplyr)
library(tidyr)
data %>%
  gather(var, val, 2:ncol(data)) %>%
  spread(Series.Description, val)
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • 1
    Absolutely beautiful, @AnandaMahto. Thank you so much. I never get my head around inner workings of `tidyr`. This is my study material for the weekend. – ExperimenteR Mar 07 '15 at 16:43
  • 3
    I love this solution. A slightly more generic way is replace `spread(Series.Description, val)` by `spread_(names(data)[1], "val")` – jbkunst Jan 11 '17 at 15:54
  • Very elegant. I'm about 3 months into an intensive R learning experience, having started from pretty much zero knowledge. (And I'm not a developer by training or profession.) Would someone explain to me why gather/spread is a better solution than melt/dcast in this situation? It'd be really helpful in my learning. I know this is a better Q for blog.rstudio, but this question is here, not there! – Steve Oct 16 '17 at 17:11
  • 2
    @Steve `gather`/`spread` is not a *better* solution than `melt`/`dcast`; it's just a *different* solution that fits in with a different grammar of data manipulation. You would note, for instance, that `dcast` also allows for aggregation while reshaping, while `spread` would require already aggregated data. If you're just reshaping the data, then it's down to a matter of preference, I'd say. – A5C1D2H2I1M1N2O1R2T1 Oct 17 '17 at 01:55
  • @A5C1D2H2I1M1N2O1R2T1 Great solution! Any way to preserve the row order of the first column when it's spread as column names? I have a column of dates in my first column and when that gets spread to the new columns it gets converted to character and ordered alphabetically. – pentandrous Oct 17 '17 at 17:00
  • @pentandrous, why don't you just convert to factor, with the levels in the correct order, and then `spread`? For example, compare the `spread` results for "df1" and "df2" here: `df1 <- df2 <- data.frame(key = c(1, 2, 2), var = letters[c(1, 3, 2)], val = c(10, 20, 30), stringsAsFactors = FALSE); df2$var <- factor(df2$var, c("c", "b", "a")); spread(df1, var, val); spread(df2, var, val)`. – A5C1D2H2I1M1N2O1R2T1 Oct 18 '17 at 03:51
  • @A5C1D2H2I1M1N2O1R2T1 Thanks, that works. I didn't want to type the levels so used `pull(df, first_col)` to set the order. I also got it to work by keeping the dates in the first column in `%Y-%m-%d` format. – pentandrous Oct 18 '17 at 15:05
  • 1
    `gather` and `spread` is deprecated. Instead us `pivot_longer` and `pivot_wider`. – Werner Nov 19 '20 at 00:08
9
library(dplyr)
# Omitted data <- structure part ...

Here is something that replicates what's in the main answer, but more generically (e.g., works where Series.Description is not the first column of the result) and using the newer pivot_wider/pivot_longer verbs.

df_transpose <- function(df) {
  
  df %>% 
    tidyr::pivot_longer(-1) %>%
    tidyr::pivot_wider(names_from = 1, values_from = value)

}

df_transpose(data)
#> # A tibble: 26 x 5
#>    name                   `Unit:`    `Multiplier:` `Currency:` `Unique Identifi…
#>    <chr>                  <chr>      <chr>         <chr>       <chr>            
#>  1 Nominal.Broad.Dollar.… Index:_19… 1             <NA>        H10/H10/JRXWTFB_…
#>  2 Nominal.Major.Currenc… Index:_19… 1             <NA>        H10/H10/JRXWTFN_…
#>  3 Nominal.Other.Importa… Index:_19… 1             <NA>        H10/H10/JRXWTFO_…
#>  4 AUSTRALIA....SPOT.EXC… Currency:… 1             USD         H10/H10/RXI$US_N…
#>  5 SPOT.EXCHANGE.RATE...… Currency:… 1             USD         H10/H10/RXI$US_N…
#>  6 NEW.ZEALAND....SPOT.E… Currency:… 1             USD         H10/H10/RXI$US_N…
#>  7 United.Kingdom....Spo… Currency:… 0.01          USD         H10/H10/RXI$US_N…
#>  8 BRAZIL....SPOT.EXCHAN… Currency:… 1             BRL         H10/H10/RXI_N.M.…
#>  9 CANADA....SPOT.EXCHAN… Currency:… 1             CAD         H10/H10/RXI_N.M.…
#> 10 CHINA....SPOT.EXCHANG… Currency:… 1             CNY         H10/H10/RXI_N.M.…
#> # … with 16 more rows

But note that (like the answer above) the name of the first column is lost. The following retains this (as, I guess does the spread_(names(data)[1], "val") approach proposed by @jbkunst above).

df_transpose <- function(df) {
  
  first_name <- colnames(df)[1]
  
  temp <-
    df %>% 
    tidyr::pivot_longer(-1) %>%
    tidyr::pivot_wider(names_from = 1, values_from = value)
  
  colnames(temp)[1] <- first_name
  temp
}

df_transpose(data)
#> # A tibble: 26 x 5
#>    Series.Description       `Unit:`   `Multiplier:` `Currency:` `Unique Identif…
#>    <chr>                    <chr>     <chr>         <chr>       <chr>           
#>  1 Nominal.Broad.Dollar.In… Index:_1… 1             <NA>        H10/H10/JRXWTFB…
#>  2 Nominal.Major.Currencie… Index:_1… 1             <NA>        H10/H10/JRXWTFN…
#>  3 Nominal.Other.Important… Index:_1… 1             <NA>        H10/H10/JRXWTFO…
#>  4 AUSTRALIA....SPOT.EXCHA… Currency… 1             USD         H10/H10/RXI$US_…
#>  5 SPOT.EXCHANGE.RATE...EU… Currency… 1             USD         H10/H10/RXI$US_…
#>  6 NEW.ZEALAND....SPOT.EXC… Currency… 1             USD         H10/H10/RXI$US_…
#>  7 United.Kingdom....Spot.… Currency… 0.01          USD         H10/H10/RXI$US_…
#>  8 BRAZIL....SPOT.EXCHANGE… Currency… 1             BRL         H10/H10/RXI_N.M…
#>  9 CANADA....SPOT.EXCHANGE… Currency… 1             CAD         H10/H10/RXI_N.M…
#> 10 CHINA....SPOT.EXCHANGE.… Currency… 1             CNY         H10/H10/RXI_N.M…
#> # … with 16 more rows

Created on 2021-05-30 by the reprex package (v2.0.0)

Ian Gow
  • 3,098
  • 1
  • 25
  • 31