0

I have a dataset like the following:

Age      Monday Tuesday Wednesday 
6-9        a     b        
6-9        b     a        c
6-9              c        a
9-10       c     c        b
9-10       c     a        b

Using R, I want to get the following data set/ results (where 1 represents the presence of the element and 0 represents the absence):

Age        a     b        c
6-9        1     1        0
6-9        1     1        1
6-9        1     0        1
9-10       0     1        1
9-10       1     1        1
BENY
  • 317,841
  • 20
  • 164
  • 234
fitrat
  • 65
  • 5

3 Answers3

3

This can be done by melt then create the freq count by table

library(reshape2)
df['New']=row.names(df)
s=melt(df,c('Age','New'))
s=as.data.frame.matrix(table(s$New,s$value))
s$Age=df$Age
s
  a b c  Age
1 1 1 0  6-9
2 1 1 1  6-9
3 1 0 1  6-9
4 0 1 2 9-10
5 1 1 1 9-10
BENY
  • 317,841
  • 20
  • 164
  • 234
  • @denis glad I can help :-) have a nice day – BENY Nov 19 '19 at 20:32
  • 2
    You could also achieve it with just `base R` `data.frame(df[1], as.data.frame.matrix(table(data.frame(1:nrow(df), stack(df[-1])[1]))))` :) – tmfmnk Nov 19 '19 at 20:45
1

One option with pivot_longer and pivot_wider

library(dplyr)
library(tidyr)
df1 %>%
   mutate(rn = row_number()) %>% 
   pivot_longer(cols = -c(Age, rn)) %>%
   filter(value != '')  %>% 
   select(-name) %>%
   distinct %>%
   mutate(val = 1) %>%
   pivot_wider(names_from = value, values_from = val, 
            values_fill = list(val = 0)) %>%
   select(-rn)
# A tibble: 5 x 4
#  Age       a     b     c
#  <chr> <dbl> <dbl> <dbl>
#1 6-9       1     1     0
#2 6-9       1     1     1
#3 6-9       1     0     1
#4 9-10      0     1     1
#5 9-10      1     1     1

data

df1 <- structure(list(Age = c("6-9", "6-9", "6-9", "9-10", "9-10"), 
    Monday = c("a", "b", "", "c", "c"), Tuesday = c("b", "a", 
    "c", "c", "a"), Wednesday = c("", "c", "a", "b", "b")),
    class = "data.frame", row.names = c(NA, 
-5L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I am getting an error for the select(-name). Error in select(., -name) : unused argument (-name) – fitrat Nov 19 '19 at 20:27
  • @fitrat try by changing `select(-name)` to `dplyr::select(-name)` could be a masking issue by `select` from some other package – akrun Nov 19 '19 at 20:29
1

A data.table solution, using an ID variable;

library(data.table)
library(magrittr)
df <- setDT(df)

ag = function(x){if(length(x>1)){1}else{length(x)}}

df[,idx:=.I][]%>%
  melt(id.vars = c("Age","idx")) %>%
  .[,.(Age,value,idx)]%>%
  dcast(Age+idx~value,fun.aggregate = ag)%>%
  .[,-c("idx","NA")]


    Age a b c
1:  6-9 1 1 0
2:  6-9 1 1 1
3:  6-9 1 0 1
4: 9-10 0 1 1
5: 9-10 1 1 1

The data:

df <- read.table(text = "Age      Monday Tuesday Wednesday 
6-9        a     b        NA
6-9        b     a        c
6-9       NA     c        a
9-10       c     c        b
9-10       c     a        b",header = T)
denis
  • 5,580
  • 1
  • 13
  • 40
  • Thank you very much for your help. However, instead of just giving me the presence or absence of each unique element per row, it is giving me the total number of unique elements per row. It would be appreciated if you suggest me how can I just get the presence or absence of each unique element per row? – fitrat Nov 19 '19 at 20:35
  • Ho, right I didn't realized, sorry. I edited my answer, but WeNYoBen anser's is way better and simpler I think. – denis Nov 19 '19 at 20:55