0

Sorry "this matrix format" is very vague in my question (suggestions to improve my question?). I have a matrix that's like this

x <- data.frame(ID = c('A','B','C','D'), SCORE_YR1 = c(2,2,1,0), 
        SCORE_YR2 = c(2,3,3,1), SCORE_YR3 = c(0,2,2,5))

x
  ID SCORE_YR1 SCORE_YR2 SCORE_YR3
1  A         2         2         0
2  B         2         3         2
3  C         1         3         2
4  D         0         1         5

I would like to transform the matrix format to look like this

y <- data.frame(ID = rep(c('A','B','C','D'),3), YEAR = rep(1:3,each=4), 
        SCORE = c(x$SCORE_YR1,x$SCORE_YR2,x$SCORE_YR3))

y
   ID YEAR SCORE
1   A    1     2
2   B    1     2
3   C    1     1
4   D    1     0
5   A    2     2
6   B    2     3
7   C    2     3
8   D    2     1
9   A    3     0
10  B    3     2
11  C    3     2
12  D    3     5

Is there a function that can easily transform the dataframe like this?

Thanks

LC-datascientist
  • 1,960
  • 1
  • 18
  • 32
  • `library("data.table"); df1 <- melt(setDT(df), id.vars = c("ID"), measure = patterns("SCORE_YR"), value.name = "SCORE", variable.name = "YEAR" ); df1$YEAR <- as.integer(df1$YEAR)` – Sathish Aug 09 '16 at 16:44

3 Answers3

3

You can use melt from the reshape2 package:

library(reshape2)

x <- melt(x, id.vars = "ID")

Change column names to what you have above:

names(x)[2:3] <- c("YEAR","SCORE")

At this point the data frame it looks like this:

> x
   ID      YEAR SCORE
1   A SCORE_YR1     2
2   B SCORE_YR1     2
3   C SCORE_YR1     1
4   D SCORE_YR1     0
5   A SCORE_YR2     2
6   B SCORE_YR2     3
7   C SCORE_YR2     3
8   D SCORE_YR2     1
9   A SCORE_YR3     0
10  B SCORE_YR3     2
11  C SCORE_YR3     2
12  D SCORE_YR3     5

Doing as.numeric on your YEAR column converts it to a number:

x$YEAR <- as.numeric(x$YEAR)

> x
   ID YEAR SCORE
1   A    1     2
2   B    1     2
3   C    1     1
4   D    1     0
5   A    2     2
6   B    2     3
7   C    2     3
8   D    2     1
9   A    3     0
10  B    3     2
11  C    3     2
12  D    3     5

The problem is that you have data in a "wide" format and you want to convert it to "long". melt is usually great for these situations.

Warner
  • 1,353
  • 9
  • 23
  • `as.numeric` works in this case but I would caution you that it doesn't work in all cases where you want to convert character to numeric. Other cases may require you to remove characters from your string before it can properly be turned into numeric. – Warner Aug 09 '16 at 16:45
  • 1
    `as.integer` could be a bit more faster – akrun Aug 09 '16 at 17:57
1

With dplyr and tidyr, you can do:

library(dplyr); library(tidyr)
x %>% 
      gather(YEAR, SCORE, -ID) %>% 
      mutate(YEAR = extract_numeric(YEAR))

#   ID YEAR SCORE
#1   A    1     2
#2   B    1     2
#3   C    1     1
#4   D    1     0
#5   A    2     2
#6   B    2     3
#7   C    2     3
#8   D    2     1
#9   A    3     0
#10  B    3     2
#11  C    3     2
#12  D    3     5

Or use reshape function from base R:

reshape(x, varying = 2:4, sep = "_YR", dir = "long", timevar = "YEAR")[1:3]

#    ID YEAR SCORE
#1.1  A    1     2
#2.1  B    1     2
#3.1  C    1     1
#4.1  D    1     0
#1.2  A    2     2
#2.2  B    2     3
#3.2  C    2     3
#4.2  D    2     1
#1.3  A    3     0
#2.3  B    3     2
#3.3  C    3     2
#4.3  D    3     5
Psidom
  • 209,562
  • 33
  • 339
  • 356
1

A base solution that would give you something that could easily be reworked to what you need would involve using stack. The data.frame function will do the "rep()-ing for you via R's recyclng rules:

y <- data.frame(x$ID, stack(x[-1]))
y
#-------------
   x.ID values       ind
1     A      2 SCORE_YR1
2     B      2 SCORE_YR1
3     C      1 SCORE_YR1
4     D      0 SCORE_YR1
5     A      2 SCORE_YR2
6     B      3 SCORE_YR2
7     C      3 SCORE_YR2
8     D      1 SCORE_YR2
9     A      0 SCORE_YR3
10    B      2 SCORE_YR3
11    C      2 SCORE_YR3
12    D      5 SCORE_YR3

This would convert the factor ind column to a numeric vector:

> y$ind <- seq_along(unique(y$ind))[y$ind]
> y
   x.ID values ind
1     A      2   1
2     B      2   1
3     C      1   1
4     D      0   1
5     A      2   2
6     B      3   2
7     C      3   2
8     D      1   2
9     A      0   3
10    B      2   3
11    C      2   3
12    D      5   3
IRTFM
  • 258,963
  • 21
  • 364
  • 487