Rearrange table by column

Question

I have this table:

       nb_5 nb_10 nb_15 nb_20 nb_25 nb_30 nb_35
 [1,]    0     0     1     0     0     0     0
 [2,]    0     0     1     1     0     0     1
 [3,]    0     0     1     0     2     0     1
 [4,]    0     0     0     0     0     1     0
 [5,]    0     1     0     0     0     1     1
 [6,]    0     1     0     1     3     0     1
 [7,]    0     0     0     1     0     2     1
 [8,]    0     1     0     1     0     0     0
 [9,]    0     1     1     1     1     1     2
[10,]    0     1     0     1     1     0     0

The number in this table represents my values. Each column represents a condition of my values. Thus in the first column we have the data under the condition "nb_5".

It is possible to transform this table so that it has 2 columns: 1 column "nb_of" with repetition of 5, of 10, of 15 etc.

nb_of <- c(5,5,5,5,5,5,5,5,5,5,10,10,10,10,10,10,10,10,10,10,15,15,15,15)
data <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,1,1,1,1,1,1,0) 
newdata <- cbind(nb_of, data)

Search SO for "wide to long", and then you'll need `sub` to remove the leading `"nb_"` (and then `as.integer`). — r2evans, Jul 22 '19 at 16:48
BTW: your first data is a `matrix` but your solution is a frame, so you'd need to converting your first to a frame first. One example of wide-to-long is here: https://stackoverflow.com/q/2185252, and gives many base R and (if you scroll down enough) `tidyr` methods. There's also `data.table::melt` if you are using that package. — r2evans, Jul 22 '19 at 16:50
I, thx you for your help. In my function I use : `colnames(data) <- c(paste0("nb_", condition))`, so I can just remove "nb_" to have just number. After that I can transform my matrix into frame. Could gather or spread help in this condition ? thx — paul latouche, Jul 22 '19 at 17:25

score 1 · Answer 1 · answered Jul 22 '19 at 17:34

Using dplyr is an option, assuming m is the matrix you showed above:

# m <- read.table("clipboard") # after copying your data above
library(dplyr)
library(tidyr)
gather(as.data.frame(m), nb_of, data) %>%
  mutate(nb_of = as.integer(sub("nb_", "", nb_of))) %>%
  head()
#   nb_of data
# 1     5    0
# 2     5    0
# 3     5    0
# 4     5    0
# 5     5    0
# 6     5    0

If you never prepend "nb_" to your data, you'll still need to use as.integer or as.numeric, since the column names will be returned as character. This can be seen with

colnames(m) <- seq(5, 35, by=5)
m
#       5 10 15 20 25 30 35
# [1,]  0  0  1  0  0  0  0
# [2,]  0  0  1  1  0  0  1
# ...

gather(as.data.frame(m), nb_of, data) %>% str()
# 'data.frame': 70 obs. of  2 variables:
#  $ nb_of: chr  "5" "5" "5" "5" ...
#  $ data : int  0 0 0 0 0 0 0 0 0 0 ...

score 0 · Answer 2 · answered Jul 22 '19 at 19:08

This is basically convert wide data to long format problem. We can use data.table for this purpose:

library(data.table)

melt(setDT(df1), 
      variable.name = "nb_of", value.name="data", 
      measure.vars = colnames(df1))[, 
                                     nb_of := as.numeric(gsub('.*_','\\1',nb_of))][]

#>     nb_of data
#>  1:     5    0
#>  2:     5    0
#>  3:     5    0
#>  4:     5    0
#>  5:     5    0
#>  6:     5    0
#>  7:     5    0
#>  8:     5    0
#>  9:     5    0
#> 10:     5    0
##... just showing part of the output#

You can ignore measure.vars as we are using all the columns but you'll get this warning:

#> Warning in melt.data.table(setDT(df1), variable.name = "nb_of", value.name
#> = "data"): To be consistent with reshape2's melt, id.vars and measure.vars
#> are internally guessed when both are 'NULL'. All non-numeric/integer/
#> logical type columns are considered id.vars, which in this case are columns
#> []. Consider providing at least one of 'id' or 'measure' vars in future.

Data:

read.table(text = "     nb_5  nb_10 nb_15 nb_20 nb_25 nb_30 nb_35
                  [1,]    0     0     1     0     0     0     0
                  [2,]    0     0     1     1     0     0     1
                  [3,]    0     0     1     0     2     0     1
                  [4,]    0     0     0     0     0     1     0
                  [5,]    0     1     0     0     0     1     1
                  [6,]    0     1     0     1     3     0     1
                  [7,]    0     0     0     1     0     2     1
                  [8,]    0     1     0     1     0     0     0
                  [9,]    0     1     1     1     1     1     2
                  [10,]   0     1     0     1     1     0     0") -> df1

BTW: even read-in this way, it *looks* like a matrix because `print.data.frame` is including the row names, which based on what was provided, are the strings `"[1,]"`, `"[2,]"`, etc. (In answer to your previous comment :-) — r2evans, Jul 22 '19 at 19:53
@r2evans yeah, you're right. But I specified the data at the bottom and that does not need `as.data.frame` because it is one. However, as I pointed out replying to your comment, I figured you have that because of OP's actual dataset (which they did not provided as reproducible). — M--, Jul 22 '19 at 19:58
Yup, I typically include full data as you did here but skimped out this time, thanks for picking up my slack ... I'm not certain if we should produce an actual matrix as we infer from the question ... \*shrug\* — r2evans, Jul 22 '19 at 19:59

Rearrange table by column

2 Answers2