1

I have a df looking like this:

EXP   G   NA   NA     NA      NA 
EXP   D   1   NAME    2     NUMBER
EXP   D   1   NAME    2     NUMBER
EXP   G   NA   NA     NA      NA 
EXP   D   1   NAME    2     NUMBER
EXP   D   1   NAME    2     NUMBER

df = data.frame(Exp = rep('EXP', 6),
            Code = rep(c('G', 'D', 'D')),
            Name.num = rep(c(NA, 1, 1)),
            Name = rep(c(NA, 'NAME', 'NAME')),
            Number.num = rep(c(NA, 2, 2)),
            Number = rep(c(NA, 'NUMBER', 'NUMBER')))

And I would like to get an output like this:

EXP  G   NA   NA  
EXP  D   1    NAME
EXP  D   2    NUMBER
EXP  G   NA   NA  
EXP  D   1    NAME
EXP  D   2    NUMBER

Where in only one column I can have both (Numbers and names) in the same column.

Any suggestions? So far I tried with reshape based on the answer to this question since I do not want to use packages but I did not succeed.

df = reshape(df, direction = 'wide', varying = list(df)[c(4,6)],
         idvar = c('Name.num', 'Number.num', 'Name', 'Number'))
Biostatician
  • 111
  • 5
  • "tried with reshape" please add the code, even if it didn't work. Also, is the expected output complete output or just part of the expected output? – zx8754 Mar 25 '19 at 10:13
  • 1
    @zx8754 that is the complete output... editing question to add `reshape` code. – Biostatician Mar 25 '19 at 10:15

2 Answers2

2

If your pattern is always the same, this is a very easy way to do it:

df = data.frame(Exp = rep('EXP', 6),
            Code = rep(c('G', 'D', 'D')),
            Name.num = rep(c(NA, 1, 1)),
            Name = rep(c(NA, 'NAME', 'NAME')),
            Number.num = rep(c(NA, 2, 2)),
            Number = rep(c(NA, 'NUMBER', 'NUMBER')),stringsAsFactors = FALSE)
ss=seq(1,nrow(df),by=3)
df$out=NA

sapply(ss,function(x){
  df$out[x]<<-NA
  df$out[x+1]<<-df$Name[x+1]
  df$out[x+2]<<-df$Number[x+2]
  df$Name.num[x+2]<<- 2
})
df_out=df[,c(1,2,3,7)]

  Exp Code Name.num    out
1 EXP    G       NA   <NA>
2 EXP    D        1   NAME
3 EXP    D        2 NUMBER
4 EXP    G       NA   <NA>
5 EXP    D        1   NAME
6 EXP    D        2 NUMBER
boski
  • 2,437
  • 1
  • 14
  • 30
  • Exact same example and I could not replicate your result... somehow I get lots of `1` on `out` in `df_out`. – Biostatician Mar 25 '19 at 12:01
  • 1
    when creating the dataframe, use `stringsAsFactors = FALSE` and try again. I have edited my answer. – boski Mar 25 '19 at 12:52
1

using data.table

melt( setDT(df), 
    id.vars = c("Exp", "Code"), 
    measure.vars = patterns( num = "^.*\\.num", 
                             val = "^Name$|^Number$" )
    )[, variable :=NULL ][]

produces:

#     Exp Code num    val
#  1: EXP    G  NA   <NA>
#  2: EXP    D   1   NAME
#  3: EXP    D   1   NAME
#  4: EXP    G  NA   <NA>
#  5: EXP    D   1   NAME
#  6: EXP    D   1   NAME
#  7: EXP    G  NA   <NA>
#  8: EXP    D   2 NUMBER
#  9: EXP    D   2 NUMBER
# 10: EXP    G  NA   <NA>
# 11: EXP    D   2 NUMBER
# 12: EXP    D   2 NUMBER
Wimpel
  • 26,031
  • 1
  • 20
  • 37