Grouping columns with the same name in R

Question

I'm trying to format my data in a 'readable' way where I have multiple columns with the same name. I tried using the melt() function, but I failed to solve the problem, which seems to be related to the fact that there are different values on the variables.

A small example of the data:

obs     m   ti      td        date        class  code   dis       group  status     grade   freq    date              dis     group   status    grade   freq    date             dis    group   status  grade   freq    date
obs_1   A   grad        05/01/2016 00:00         55060  DDE0300  2016101    A        5.7     97   05/01/2016 15:20  MS0230  2016101      A      8.19    100 05/01/2016 15:20    A0301   2016101  A        5.8   100  27/01/2016 13:12
obs_2   A   grad        05/01/2016 00:00         55070  SSE332         0    D                     03/06/2016 14:08   A0804    0          D                  03/06/2016 14:18    SE089   0        D                   26/08/2016 19:31

And now I want to split this data frame by observation:

    melt(df[1,],id.vars=c("obs","m","ti","td","date","class","code"), 
            measure.vars=c("dis","group","status","grade","freq","date"))

I get:

    obs  m   ti td             date class  code variable            value
1 obs_1 A  grad NA 05/01/2016 15:20    NA 55060      dis          DDE0300
2 obs_1 A  grad NA 05/01/2016 15:20    NA 55060    group          2016101
3 obs_1 A  grad NA 05/01/2016 15:20    NA 55060   status               A 
4 obs_1 A  grad NA 05/01/2016 15:20    NA 55060    grade              5.7
5 obs_1 A  grad NA 05/01/2016 15:20    NA 55060     freq               97
6 obs_1 A  grad NA 05/01/2016 15:20    NA 55060     date 05/01/2016 15:20
Warning message:
attributes are not identical across measure variables; they will be dropped

Now, I'm 'missing' two columns, which would be MS0230 and A0301 and their group, status, and so on. How can I fix this?

Keep in mind, it doesn't have to be using the melt() function.

Code to reproduce the data:

df<-structure(list(obs = structure(1:2, .Label = c("obs_1", "obs_2"
), class = "factor"), m = structure(c(1L, 1L), .Label = "A ", class = "factor"), 
    ti = structure(c(1L, 1L), .Label = "grad", class = "factor"), 
    td = c(NA, NA), datei = structure(c(1L, 1L), .Label = "05/01/2016 00:00", class = "factor"), 
    class = c(NA, NA), code = c(55060L, 55070L), dis = structure(1:2, .Label = c("DDE0300", 
    "SSE332"), class = "factor"), group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ", 
    "D "), class = "factor"), grade = c(5.7, NA), freq = c(97L, 
    NA), date = structure(c(2L, 1L), .Label = c("03/06/2016 14:08", 
    "05/01/2016 15:20"), class = "factor"), dis = structure(c(2L, 
    1L), .Label = c("A0804", "MS0230"), class = "factor"), group = c(2016101L, 
    0L), status = structure(1:2, .Label = c("A ", "D "), class = "factor"), 
    grade = c(8.19, NA), freq = c(100L, NA), date = structure(c(2L, 
    1L), .Label = c("03/06/2016 14:18", "05/01/2016 15:20"), class = "factor"), 
    dis = structure(1:2, .Label = c("A0301", "SE089"), class = "factor"), 
    group = c(2016101L, 0L), status = structure(1:2, .Label = c("A ", 
    "D "), class = "factor"), grade = c(5.8, NA), freq = c(100L, 
    NA), date = structure(c(2L, 1L), .Label = c("26/08/2016 19:31", 
    "27/01/2016 13:12"), class = "factor")), .Names = c("obs", 
"m", "ti", "td", "datei", "class", "code", "dis", "group", "status", 
"grade", "freq", "date", "dis", "group", "status", "grade", "freq", 
"date", "dis", "group", "status", "grade", "freq", "date"), class = "data.frame", row.names = c(NA, 
-2L))

Seems like a duplicate of [Reshaping multiple sets of measurement columns (wide format) into single columns (long format)](https://stackoverflow.com/questions/12466493/reshaping-multiple-sets-of-measurement-columns-wide-format-into-single-columns). Try e.g. `reshape(df, idvar = "obs", direction = "long", varying = list(dis = c(8, 14, 20), group = c(9, 15, 21), status = c(10, 16, 22), grade = c(11, 17, 23), freq = c(12, 18, 24), date = c(13, 19, 25)))` — Henrik, Mar 30 '18 at 20:56
Please show desired result. You make it seem obvious that *MS0230* and *A0301* should be columns after `melt`. — Parfait, Mar 30 '18 at 21:14

score 0 · Answer 1 · answered Mar 30 '18 at 21:49

0

Thanks to Henrik's link, I managed to figure it out. Not sure if it's the best solution tho.

But here's what I did:

melt(setDT(df[1,]), id=1L, id.vars=c("obs","m","ti","td","date","class","code"),
      measure=patterns("dis","group","status","grade","freq","date"),
      value.name=c("Dis","Group","Status","Grade","Freq","Date"))

Which gave me:

     obs  m   ti td             date class  code variable     Dis   Group Status Grade Freq             Date
1: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        1 DDE0300 2016101     A   5.70   97 05/01/2016 00:00
2: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        2  MS0230 2016101     A   8.19  100 05/01/2016 15:20
3: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        3   A0301 2016101     A   5.80  100 05/01/2016 15:20
4: obs_1 A  grad NA 05/01/2016 15:20    NA 55060        4      NA      NA     NA    NA   NA 27/01/2016 13:12

answered Mar 30 '18 at 21:49

Holyzin

11
2

Not exactly because for this particular case, I only used melt() for the first line. So I only got the DDE0300, MS0230 and A0301. I'm not sure why there's a 4th line there tho. – Holyzin Mar 31 '18 at 01:39
Yes, it seems that the first 'Date' is picking up 'date', and the last 'Date' (27/01/2016 13:12) is the actual last 'Date', any ideas to fix it? – Holyzin Mar 31 '18 at 02:30
I think I fixed it. So, in patterns(), if you type the string as "date", the function will pickup every column that has the string "date" and melt them. Now, using "date$" it seems it only gets the columns that match the entire name. That being said, in this case, there are 2 columns that have the name "date", but it seems to have fixed it. melt(setDT(df[1,]), id=1L, id.vars=c("obs","m","ti","td","date","class","code"), measure=patterns("dis","group","status","grade","freq","date$"), value.name=c("Dis","Group","Status","Grade","Freq","Date")) – Holyzin Mar 31 '18 at 15:02

Grouping columns with the same name in R

1 Answers1