0

I have a "wide" dataset where for each observation I measure a value from a bunch of categorical variables. It is presented just like this:

V1 V2 V3
a z f
a z f
b y g
b y g
a y g
b y f

this means that V1 has two categories "a" and "b", V2 has two categories "z" and "y", and so on. But suppose that I have 30 variables (a quite bigger dataset).

I want to obtain a dataset in this form

V1 V2 V3 Freq
a z f 2
b y g 2
a y g 1
b y f 1

How can I get it in R? with smaller datasets I use transform(table(data.frame(data))) but it doesn't work with bigger datasets since it requires to build giant tables. Can somebody help please?

I would like to get a "general" code that does not depend on the variables name since I will be using it in a function. And moreover, since the datasets will be big I prefer to do it without the function table.

Thanks

Andre Wildberg
  • 12,344
  • 3
  • 12
  • 29
Bibi
  • 87
  • 9

2 Answers2

2

In base R, with interaction:

as.data.frame(table(interaction(df, sep = "", drop = TRUE)))

Or, with table:

subset(data.frame(table(df)), Freq > 0)

#  V1 V2 V3 Freq
#2  b  y  f    1
#3  a  z  f    2
#5  a  y  g    1
#6  b  y  g    2

With dplyr:

library(dplyr)
df %>% 
  count(V1, V2, V3, name = "Freq")

#  V1    V2    V3     Freq
#1 a     y     g         1
#2 a     z     f         2
#3 b     y     f         1
#4 b     y     g         2
Maël
  • 45,206
  • 3
  • 29
  • 67
  • I edited my question, please have look and see if you could help me! Thanks in advance! – Bibi Feb 13 '23 at 11:44
2

I assume your dataset dt contains only categorical variables and Freq represents the number of observations for each unique combination of the categorical variables.

As you want codes "without using dplyr," here is an alternative using data.table.

library(data.table)
dt[, Freq:=.N, by=c(colnames(dt))]
caboonge
  • 21
  • 2