2

I'm working on some World Bank panel data (with NA values) that have the following structure:

    df <- read.table(text="
    Indicator Country 1996 1997 1998
    X         A       v1   NA   v3
    X         B       v4   v5   v6
    X         C       NA   v8   v9
    Y         A       z1   NA   z3
    Y         B       NA   NA   z6
    Y         C       z7   z8   z9", header = TRUE)

And I look for obtaining this structure:

    Country  Year   X    Y
    A        1996   v1   z1
    A        1997   NA   NA
    A        1998   v3   z3
    B        1996   v4   NA
    B        1997   v5   NA
    B        1998   v6   z6
    C        1996   NA   z7
    C        1997   v8   z8
    C        1998   v9   z9

I tried the answers given in Reshaping data.frame from wide to long format with the following code:

    df.reshaped=reshape(df, direction="long", varying=list(names(df)[3:5]), 
    v.names=c("X", "Y"), idvar= "Country", times=1996:1998)

but don't get what I want. The real flat file contain almost 20 indicators * 214 Countries * 35 Years, so looking for your help.

Community
  • 1
  • 1
Houssem
  • 143
  • 9

2 Answers2

2

We can use melt/dcast

library(data.table)
dcast(melt(setDT(df), id.var=c("Indicator", "Country"), 
    variable.name="year"), 
        Country+year~Indicator, value.var='value')
#   Country year  X  Y
#1:       A 1996 v1 z1
#2:       A 1997 NA NA
#3:       A 1998 v3 z3
#4:       B 1996 v4 NA
#5:       B 1997 v5 NA
#6:       B 1998 v6 z6
#7:       C 1996 NA z7
#8:       C 1997 v8 z8
#9:       C 1998 v9 z9
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    @Houssem, this is likely to be a faster/better/more-flexible answer. You may want to reconsider the check-mark and consider a vote on my answer instead. – A5C1D2H2I1M1N2O1R2T1 Jan 19 '16 at 10:52
  • @AnandaMahto, Did'nt really understand you, I'am new in this forum. Thanks for you and for 'akrun'. – Houssem Jan 19 '16 at 10:58
2

For reference, you can do something similar in base R with the a combination of reshape + stack:

reshape(cbind(df[c(1, 2)], 
              stack(lapply(df[-c(1, 2)], as.character))), 
        direction = "wide", 
        idvar = c("Country", "ind"), 
        timevar = "Indicator")
#    Country  ind values.X values.Y
# 1        A 1996       v1       z1
# 2        B 1996       v4     <NA>
# 3        C 1996     <NA>       z7
# 7        A 1997     <NA>     <NA>
# 8        B 1997       v5     <NA>
# 9        C 1997       v8       z8
# 13       A 1998       v3       z3
# 14       B 1998       v6       z6
# 15       C 1998       v9       z9

And, in the Hadleyverse®, with gather and spread:

library(dplyr)
library(tidyr)
df %>%
  gather(Year, value, -Country, -Indicator) %>%
  spread(Indicator, value)
#   Country Year    X    Y
# 1       A 1996   v1   z1
# 2       A 1997 <NA> <NA>
# 3       A 1998   v3   z3
# 4       B 1996   v4 <NA>
# 5       B 1997   v5 <NA>
# 6       B 1998   v6   z6
# 7       C 1996 <NA>   z7
# 8       C 1997   v8   z8
# 9       C 1998   v9   z9
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485