0

If I have a table

ID   C1   C2   C3   C4   C5
 1                     
 2    HD   NS   VM          
 3                   DH   FV
 4    HD             DH     

Now I have to convert it to the following table

ID  C
2   HD
2   NS
2   VM
3   DH
3   FV
4   HD
4   DH

What is the best way to make this conversion in R? This has to be done in R as part of a larger code?

ZzZ
  • 3
  • 2

3 Answers3

1

We can use dplyr and tidyr.

library(dplyr)
library(tidyr)

dt2 <- dt %>%
  gather(C, Value, -ID, na.rm = TRUE) %>%
  select(-C) %>%
  arrange(ID)
dt2
  ID Value
1  2    HD
2  2    NS
3  2    VM
4  3    DH
5  3    FV
6  4    HD
7  4    DH

DATA

dt <- read.table(text = "ID   C1   C2   C3   C4   C5
 1 NA NA NA NA NA                    
                 2    HD   NS   VM NA NA           
                 3    NA  NA  NA  DH FV
                 4    HD  NA NA   DH NA     ",
                 header = TRUE, stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
  • Thank you, it worked very well. I should add that the format it was in was cvs so I had to upload the table in the method below `dt <- read.csv2("~/Desktop/R_class/Test.data.csv", sep=",", header=TRUE, stringsAsFactors = FALSE, na.strings=c("","NA"))` adding the `na.strings=c("","NA")` – ZzZ Oct 26 '17 at 14:42
1

Using melt

library(reshape)
na.omit(melt(dt,id.var='ID'))
   ID variable value
2   2       C1    HD
4   4       C1    HD
6   2       C2    NS
10  2       C3    VM
15  3       C4    DH
16  4       C4    DH
19  3       C5    FV
BENY
  • 317,841
  • 20
  • 164
  • 234
1
data <- as.data.frame(list(ID = c(1,2,3,4),
              C1 = c(NA,"HD",NA,"HD"),C2 = c(NA,"NS",NA,NA),C3 = c(NA,"VM",NA,NA),C4 = c(NA,NA,"DH","DH"), C5 = c(NA,NA,"FV",NA)))

You can use unlist, but to have the ID, you previously create a matrix of same size than your data, just with the IDS

plouf = matrix(data$ID,dim(data)[1],dim(data)[2]-1)
data$ID <- NULL

and then you use unlist and select the non NA data:

   result = as.data.frame(list(
  C = unlist(data)[!is.na(unlist(data,use.names = FALSE))], 
  ID = unlist(plouf)[!is.na(unlist(data))] ))

which leads :

 C ID
C12 HD  2
C14 HD  4
C22 NS  2
C32 VM  2
C43 DH  3
C44 DH  4
C53 FV  3
denis
  • 5,580
  • 1
  • 13
  • 40