how can I categorize based on several column

Question

I have a data like this

df<- structure(list(V1 = structure(c(10L, 4L, 7L, 5L, 3L, 1L, 8L, 
11L, 12L, 9L, 2L, 6L), .Label = c("BRA_AC_A6IX", "BRA_BH_A18F", 
"BRA_BH_A18V", "BRA_BH_A1ES", "BRA_BH_A1FE", "BRA_BH_A6R8", "BRA_E2_A15A", 
"BRA_E2_A15K", "BRA_E2_A1B4", "BRA_EM_A15E", "BRA_LQ_A4E4", "BRA_OK_A5Q2"
), class = "factor"), V2 = structure(c(2L, 3L, 5L, 3L, 3L, 5L, 
3L, 4L, 1L, 4L, 2L, 2L), .Label = c("Level ii", "Level iia", 
"Level iib", "Level iiia", "Level iiic"), class = "factor"), 
    V3 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 
    3L, 4L), .Label = c("amira", "boro", "car", "dim"), class = "factor")), class = "data.frame", row.names = c(NA, 
-12L))

I am trying to categorize them based on two column

I can do the following

library(dplyr)
df %>% 
+   group_by(V2) %>%
+   summarise(no_rows = length(V2))
# A tibble: 5 x 2
  V2         no_rows
  <fct>        <int>
1 Level ii         1
2 Level iia        3
3 Level iib        4
4 Level iiia       2
5 Level iiic       2

but I want to have an output like this

            Amira     Boro    Car   dim
Level ii                       1    
Level iia   1                  1     1
Level iib   1          1       1    
Level iiia                     1    
Level iiic  1          1

Group by both columns and then reshape your dataset – AntoniosK May 23 '18 at 19:04 — AntoniosK, May 23 '18 at 19:04

score 0 · Accepted Answer · answered May 23 '18 at 19:07

0

How about

library(reshape2)
df1 <- df[,-1]
table(melt(df1, id.var="V2")[-2])

answered May 23 '18 at 19:07

CER

854
10
22

score 0 · Answer 2 · answered May 23 '18 at 19:29

Here is a tidyverse method. I am imputing that you actually want the counts, but if you want just the presence/absence that is easy to add.

df <- structure(list(V1 = structure(c(10L, 4L, 7L, 5L, 3L, 1L, 8L, 11L, 12L, 9L, 2L, 6L), .Label = c("BRA_AC_A6IX", "BRA_BH_A18F", "BRA_BH_A18V", "BRA_BH_A1ES", "BRA_BH_A1FE", "BRA_BH_A6R8", "BRA_E2_A15A", "BRA_E2_A15K", "BRA_E2_A1B4", "BRA_EM_A15E", "BRA_LQ_A4E4", "BRA_OK_A5Q2"), class = "factor"), V2 = structure(c(2L, 3L, 5L, 3L, 3L, 5L, 3L, 4L, 1L, 4L, 2L, 2L), .Label = c("Level ii", "Level iia", "Level iib", "Level iiia", "Level iiic"), class = "factor"), V3 = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 4L), .Label = c("amira", "boro", "car", "dim"), class = "factor")), class = "data.frame", row.names = c(NA, -12L))

library(tidyverse)
df %>%
  select(-V1) %>%
  count(V2, V3) %>%
  spread(V3, n, fill = 0L)
#> # A tibble: 5 x 5
#>   V2         amira  boro   car   dim
#>   <fct>      <int> <int> <int> <int>
#> 1 Level ii       0     0     1     0
#> 2 Level iia      1     0     1     1
#> 3 Level iib      1     2     1     0
#> 4 Level iiia     0     0     2     0
#> 5 Level iiic     1     1     0     0

Created on 2018-05-23 by the reprex package (v0.2.0).

how can I categorize based on several column

2 Answers2