0

I need to use long format dataframe in order to use it in ggplot library. In this graphic I need to get x= as Conditions y=count of 0 and 1 and fill= 0 and 1. According to what I found I need to use long format dataframe so here is my problem, I can not make it by myself.

Here is my current dataframe format :

      C1  C2  C3    
1      0   1   0       
2      1   1   0       
3      1   0   0 

I would like to transform it into a new shape like :

id             Conditions Values
1               C1          0
2               C1          1   
3               C1          1 
1               C2          1
2               C2          1
3               C2          0
1               C3          0
2               C3          0
1               C3          0 

I tried to use unstack,melt, mainly reshape function and all of this but firstly did not succeed so I am not sure anymore if it is the correct way/solution for what I am trying to achieve. Many thanks for your help.

PavoDive
  • 6,322
  • 2
  • 29
  • 55
C.Brn
  • 143
  • 1
  • 1
  • 11
  • I saw this question before posting, I admit it is very similar but could not make it write in my case even after reading several tutorials of reshape function. – C.Brn Jun 16 '19 at 18:01
  • 1
    It's definitely a duplicate, but I've provided an answer anyway for your specific data. I think the answer with the most votes in the other post is more interesting than the accepted answer. Jaap gives a very thorough explanation using multiple strategies. –  Jun 16 '19 at 18:12
  • I was maybe too much focus on Aniko's solution and maybe reshape is not a solution in my case ? – C.Brn Jun 16 '19 at 18:20

3 Answers3

3

tidyr

tidyr's gather is one of the easiest, most commonly used options. You'll first need to turn your row names into a new variable id. I like tibble's rownames_to_column because I tend to prefer very descriptive function names, but you can use whatever method you like:

library(tidyr)
library(tibble)

df %>% 
    rownames_to_column("id") %>%
    gather(conditions, values, -id)

#### OUTPUT ####

  id conditions values
1  1         C1      0
2  2         C1      1
3  3         C1      1
4  1         C2      1
5  2         C2      1
6  3         C2      0
7  1         C3      0
8  2         C3      0
9  3         C3      0

The first argument after the data (conditions) tells R where to store the variable names, and the second (values) tells R where to store the values of each former variable. The -id simply tells R to gather everything but id.

base R

Following your request, and building on Onyambu's excellent suggestion, here's how you might go about using base R's reshape. You can find a good, detailed explanation of how to use reshape here.

reshape can be a little unintuitive and cumbersome to use, and this was the least painful method I could come up with. It requires that you prepend the name you want your column of values to have in the long format dataframe, in this case value. You should put a . in there too, i.e. value.C1. You can also do it without this step, but if you read the article I linked above you'll see that using this particular naming convention can save you some heartache later, when you deal with more complex cases:

names(df) <- paste0("value.", names(df))

reshape(df,                    # data
        direction = "long",    # long or wide
        varying = 1:3,         # the columns that should be stacked
        timevar = "condition"  # name of "time" variable, basically groups
        )

#### OUTPUT ####

     condition value id
1.C1        C1     0  1
2.C1        C1     1  2
3.C1        C1     1  3
1.C2        C2     1  1
2.C2        C2     1  2
3.C2        C2     0  3
1.C3        C3     0  1
2.C3        C3     0  2
3.C3        C3     0  3

Apparently reshape creates an id variable automatically based on rows. It will also recognize id if you have it in your dataframe already:

names(df) <- paste0("value.", names(df))
df$id <- letters[1:3] # add an `id` variable

reshape(df,
        direction = "long",
        varying = 1:3,
        timevar = "condition"
        )

#### OUTPUT ####

     id condition value
a.C1  a        C1     0
b.C1  b        C1     1
c.C1  c        C1     1
a.C2  a        C2     1
b.C2  b        C2     1
c.C2  c        C2     0
a.C3  a        C3     0
b.C3  b        C3     0
c.C3  c        C3     0

Another base R option (credit to Onyambu) is using cbind and stack. It's not as easily generalizable to more complex cases, but it's definitely possible with some tweaking. This should work with your example data without any issues (you will need to change some column names):

cbind(id = 1:nrow(df), stack(df))

#### OUTPUT ####

  id values ind
1  1      0  C1
2  2      1  C1
3  3      1  C1
4  1      1  C2
5  2      1  C2
6  3      0  C2
7  1      0  C3
8  2      0  C3
9  3      0  C3

reshape2

Yet another option would be melt from the reshape2 package. melt is pretty simple to use, but it has been superseded by gather (which will itself be superseded by pivot_long at some point):

library(reshape2)

df$id <- 1:nrow(df) # add id variable
melt(df, id.vars = "id")

#### OUTPUT #### 

  id variable value
1  1       C1     0
2  2       C1     1
3  3       C1     1
4  1       C2     1
5  2       C2     1
6  3       C2     0
7  1       C3     0
8  2       C3     0
9  3       C3     0
  • It works perfectly, many thanks ! As I was struggling using reshape function, do you know in my case how to solve it by using reshape function properly ? I would like to improve my skills, so good to know how to use reshape(), thanks for your help. – C.Brn Jun 16 '19 at 18:15
  • 2
    You could also present base R solutions eg `cbind(id=1:3,stack(df))` or even use the base R reshape eg `reshape(df,1:3,dir="long",sep="")` etc – Onyambu Jun 16 '19 at 19:10
  • 1
    @C.Brn sorry, but I confused the reshape package with the `reshape` function. I've edited my answer again with an example using the function. Definitely take a look at the article I link to for a more thorough explanation. –  Jun 16 '19 at 21:47
  • @Onyambu thanks for the tips. I've included a bit about the `reshape` function and the `cbind` option and named you as the inspiration. –  Jun 16 '19 at 21:56
  • 1
    All perfect, I did not want to disturb you again by asking about the reshape function. but now everything is clear with this example ! Many thanks @gersht, and also Onyambu's tips – C.Brn Jun 17 '19 at 18:23
2

If you want only with reshape you can try

df <- read.table(text = "      C1  C2  C3    
1      0   1   0       
2      1   1   0       
3      1   0   0 ")
df$id <- 1:3

library(reshape)

df2 <-melt(df,id="id")
df2
  id variable value
1  1       C1     0
2  2       C1     1
3  3       C1     1
4  1       C2     1
5  2       C2     1
6  3       C2     0
7  1       C3     0
8  2       C3     0
9  3       C3     0

You can try data.table and reshape as well

df <- read.table(text = "      C1  C2  C3    
1      0   1   0       
2      1   1   0       
3      1   0   0 ")
df$id <- 1:3
library(reshape)
library(data.table)

setDT(df)
df2 <-melt(df,id="id")
 df2[,.(Conditions= paste0(id,",",variable),Values =value)]
   Conditions Values
1:       1,C1      0
2:       2,C1      1
3:       3,C1      1
4:       1,C2      1
5:       2,C2      1
6:       3,C2      0
7:       1,C3      0
8:       2,C3      0
9:       3,C3      0
Chriss Paul
  • 1,101
  • 6
  • 19
  • I think the last line to paste `id` and `variable` isn't really needed, but more of a typo in the question (I'll edit the question, please consider editing the answer, and include how to give `condition` as name to the `variable` column after the melt). – PavoDive Jun 16 '19 at 18:22
  • 1
    @PavoDive sure please proceed then I will edit my answer. – Chriss Paul Jun 16 '19 at 18:24
1

Here is one way to accomplish it using dplyr:

df <- read.table(text =
                   "C1  C2  C3 
0   1   0       
1   1   0       
1   0   0",
                 header = TRUE, stringsAsFactors = FALSE)
df%>%
  mutate(row = rownames(.))%>%
  gather(column, value, -row)
Bryan Adams
  • 174
  • 1
  • 12