0

I have a problem with merging columns in my dataframe.

This is what my dataframe looks like:

     HDI.Rank              Country X1990 X1995 X2000 X2005 X2010 X2011 X2012 X2013
    1      171          Afghanistan 121.3 103.0  94.5  84.0  75.3  73.6  72.0  70.2
    2       85              Albania  35.1  29.1  23.2  18.2  14.8  14.2  13.8  13.3
    3       83              Algeria  39.9  36.4  33.9  28.8  23.5  22.8  22.2  21.6
    4       34              Andorra   7.5   5.2   3.9   3.1   2.4   2.4   2.3   2.2
    5      149               Angola 133.4 132.7 128.3 121.5 109.6 107.0 104.3 101.6
    6       58  Antigua and Barbuda  23.4  17.9  13.8  10.6   8.6   8.2   8.0   7.7

I want to merge all the year columns into one column called Year, but i don't know how to do it, because i still need it to match the country and HDI for each year..

So far i have tried using dplyr, but it didn't work.

    infant_data %>% unite(X1990, X1995, X2000, X2005, X2010, X2011, X2012, X2013, sep = "", remove = FALSE)

If anyone could help me with that then it would be much appreciated.

jo.magi
  • 29
  • 1
  • 7

2 Answers2

2
idcns <- c('HDI.Rank','Country');
`rownames<-`(value=NULL,reshape(
    df, ## input data.frame
    dir='l', ## specify that we want to transform from wide to long format
    idvar=idcns, ## all non-data columns must be identified as id columns
    timevar='year', ## specify the desired time variable column name in the long format
    varying=setdiff(names(df),idcns), ## unfortunately reshape() doesn't know the POE
    split=list(regexp='X',include=T,fixed=T) ## spec how to parse data col name and times
));
##    HDI.Rank     Country year     X
## 1       171 Afghanistan 1990 121.3
## 2        85     Albania 1990  35.1
## 3        83     Algeria 1990  39.9
## 4        34     Andorra 1990   7.5
## 5       149      Angola 1990 133.4
## 6        58 and Barbuda 1990  23.4
## 7       171 Afghanistan 1995 103.0
## 8        85     Albania 1995  29.1
## 9        83     Algeria 1995  36.4
## 10       34     Andorra 1995   5.2
## 11      149      Angola 1995 132.7
## 12       58 and Barbuda 1995  17.9
## 13      171 Afghanistan 2000  94.5
## 14       85     Albania 2000  23.2
## 15       83     Algeria 2000  33.9
## 16       34     Andorra 2000   3.9
## 17      149      Angola 2000 128.3
## 18       58 and Barbuda 2000  13.8
## 19      171 Afghanistan 2005  84.0
## 20       85     Albania 2005  18.2
## 21       83     Algeria 2005  28.8
## 22       34     Andorra 2005   3.1
## 23      149      Angola 2005 121.5
## 24       58 and Barbuda 2005  10.6
## 25      171 Afghanistan 2010  75.3
## 26       85     Albania 2010  14.8
## 27       83     Algeria 2010  23.5
## 28       34     Andorra 2010   2.4
## 29      149      Angola 2010 109.6
## 30       58 and Barbuda 2010   8.6
## 31      171 Afghanistan 2011  73.6
## 32       85     Albania 2011  14.2
## 33       83     Algeria 2011  22.8
## 34       34     Andorra 2011   2.4
## 35      149      Angola 2011 107.0
## 36       58 and Barbuda 2011   8.2
## 37      171 Afghanistan 2012  72.0
## 38       85     Albania 2012  13.8
## 39       83     Algeria 2012  22.2
## 40       34     Andorra 2012   2.3
## 41      149      Angola 2012 104.3
## 42       58 and Barbuda 2012   8.0
## 43      171 Afghanistan 2013  70.2
## 44       85     Albania 2013  13.3
## 45       83     Algeria 2013  21.6
## 46       34     Andorra 2013   2.2
## 47      149      Angola 2013 101.6
## 48       58 and Barbuda 2013   7.7

Data

df <- data.frame(HDI.Rank=c(171L,85L,83L,34L,149L,58L),Country=c('Afghanistan','Albania',
'Algeria','Andorra','Angola','and Barbuda'),X1990=c(121.3,35.1,39.9,7.5,133.4,23.4),X1995=c(
103,29.1,36.4,5.2,132.7,17.9),X2000=c(94.5,23.2,33.9,3.9,128.3,13.8),X2005=c(84,18.2,28.8,3.1
,121.5,10.6),X2010=c(75.3,14.8,23.5,2.4,109.6,8.6),X2011=c(73.6,14.2,22.8,2.4,107,8.2),X2012=
c(72,13.8,22.2,2.3,104.3,8),X2013=c(70.2,13.3,21.6,2.2,101.6,7.7),stringsAsFactors=F);
bgoldst
  • 34,190
  • 6
  • 38
  • 64
  • I get an error when i use this as my final code `rownames<-(value=NULL,reshape(infant_data, dir='l', idvar=idcns, timevar='year', varying=setdiff(names(infant_data),idcns), split=list(regexp='HDI',include=T,fixed=T)))` – jo.magi Jul 12 '16 at 11:49
  • Error in `[<-.data.frame`(`*tmp*`, , v.names[i], value = c(121.3, 35.1, : column name "" cannot match any column – jo.magi Jul 12 '16 at 11:50
  • To be fair i probably went wrong somewhere – jo.magi Jul 12 '16 at 11:50
  • Your code looks good, except for the `regexp='HDI'`. I'm not sure why you changed the `'X'` to `'HDI'`. Perhaps try changing it back. – bgoldst Jul 12 '16 at 11:51
  • Yes that worked, thanks! – jo.magi Jul 12 '16 at 12:16
0

You could try the melt function:

result=melt(infant_data,id.vars = c("HDI.Rank","Country"),variable.name = "year")[,year:=substring(year,2)]