0

I have a big data set of 72 columns and I want to gather each 3 of columns into a new column and thus getting 24 columns in the end.

I tried using gather() function but it works good for one time only t=i.e., it gather only 3 columns at a time.

Can I use this function in a for loop?

I tried this:

j=0
k=1
l=2
for (i in 2:24){
  neww <- gather(columns, "KEy", "Proteins H/L", c((i+j), (i+k), (i+l)), na.rm = TRUE)
  j=j+2;
  k=k+2;
  l=l+2;
}

I need to gather first 3 columns in a single column and then next 3 in another column and so on.

alistaire
  • 42,459
  • 4
  • 77
  • 117
Anam
  • 21
  • 1
  • 3
  • Please produce a reproducible example. the `gather()` function is not part of base R. What package does it come from? Are you appending columns to make a data.frame 3 times as long, or are you pasting the columns together? Without more information, it will be difficult to help you. – lmo Apr 08 '16 at 18:03
  • If you use the SE version `gather_`, you can iterate over the column names, which you can pass it as strings. – alistaire Apr 08 '16 at 18:26
  • Gather multiple sets of columns with tidyr http://stackoverflow.com/q/25925556/5249203 Did you check this ? Is it not addressing your similar problem ? Possible duplicate ? – user5249203 Apr 08 '16 at 18:34
  • gather() come from package tidyr . soryy for not mentioning that before – Anam Apr 08 '16 at 18:48
  • Yes I have checked it: stackoverflow.com/q/25925556/5249203 but in my case column do not have similar names that all starting from same string so I have to use column number – Anam Apr 08 '16 at 18:49
  • I just saw this comment, that you need column numbers. You can either get column names for the `to_long` function with `colnames(data)[i+j]` or so, or you download the latest [GitHub-build of sjmisc](https://github.com/sjPlot/sjmisc), which supports numeric column indices as well (see `?to_long` for example). – Daniel Apr 09 '16 at 07:04

1 Answers1

0

You can use the to_long function from the sjmisc-package for this purpose. This function is a convenient for-loop, which calls multiple gather() calls.

# create sample
mydat <- data.frame(age = c(20, 30, 40),
                    sex = c("Female", "Male", "Male"),
                    score_t1 = c(30, 35, 32),
                    score_t2 = c(33, 34, 37),
                    score_t3 = c(36, 35, 38),
                    speed_t1 = c(2, 3, 1),
                    speed_t2 = c(3, 4, 5),
                    speed_t3 = c(1, 8, 6))

# check tidyr. score is gathered, however, speed is not
tidyr::gather(mydat, "time", "score", score_t1, score_t2, score_t3)

>   age    sex speed_t1 speed_t2 speed_t3     time score
> 1  20 Female        2        3        1 score_t1    30
> 2  30   Male        3        4        8 score_t1    35
> 3  40   Male        1        5        6 score_t1    32
> 4  20 Female        2        3        1 score_t2    33
> 5  30   Male        3        4        8 score_t2    34
> 6  40   Male        1        5        6 score_t2    37
> 7  20 Female        2        3        1 score_t3    36
> 8  30   Male        3        4        8 score_t3    35
> 9  40   Male        1        5        6 score_t3    38

# gather multiple columns. both time and speed are gathered.
to_long(mydat, "time", c("score", "speed"),
        c("score_t1", "score_t2", "score_t3"),
        c("speed_t1", "speed_t2", "speed_t3"))

>     age    sex     time score speed
>   (dbl) (fctr)    (chr) (dbl) (dbl)
> 1    20 Female score_t1    30     2
> 2    30   Male score_t1    35     3
> 3    40   Male score_t1    32     1
> 4    20 Female score_t2    33     3
> 5    30   Male score_t2    34     4
> 6    40   Male score_t2    37     5
> 7    20 Female score_t3    36     1
> 8    30   Male score_t3    35     8
> 9    40   Male score_t3    38     6

In this case, the time vector (indicating the gathered groups) just takes one of the multiple gathered column name. If this is too confusing, you can also just number the ID variable:

to_long(mydat, "time", c("score", "speed"),
        c("score_t1", "score_t2", "score_t3"),
        c("speed_t1", "speed_t2", "speed_t3"),
        recode.key = TRUE)

>     age    sex  time score speed
>   (dbl) (fctr) (dbl) (dbl) (dbl)
> 1    20 Female     1    30     2
> 2    30   Male     1    35     3
> 3    40   Male     1    32     1
> 4    20 Female     2    33     3
> 5    30   Male     2    34     4
> 6    40   Male     2    37     5
> 7    20 Female     3    36     1
> 8    30   Male     3    35     8
> 9    40   Male     3    38     6

See ?to_long for more examples.

I'm not sure, but I think I read something on GitHub that "multiple column gathering" is also planned for tidyr somewhen...

Daniel
  • 7,252
  • 6
  • 26
  • 38