-1

I was looking at this question: Find how many times duplicated rows repeat in R data frame, which provides the following code:

library(plyr)
ddply(df,.(a,b),nrow)

However, I have a dataset with many variables, so I can't type them out like a,b in this case. I've tried using names(data) with the paste function, but it doesn't seem to work. I tried this:

var_names=paste(names(data),collapse=",")
ddply(data,.(paste(a)),nrow)

It instead gives this output:

enter image description here

However, if I manually type them out, I get the proper output:

enter image description here

What do I need to do differently here?

2 Answers2

1

Instead of paste and evaluating, make use of count from dplyr, which can take multiple columns with across and select-helpers - everything()

library(dplyr)
df %>% 
    count(across(everything()))

A reproducible example with mtcars dataset

data(mtcars)
df <- mtcars %>% 
   select(vs:carb)

count(df, across(everything()))
   vs am gear carb n
1   0  0    3    2 4
2   0  0    3    3 3
3   0  0    3    4 5
4   0  1    4    4 2
5   0  1    5    2 1
6   0  1    5    4 1
7   0  1    5    6 1
8   0  1    5    8 1
9   1  0    3    1 3
10  1  0    4    2 2
11  1  0    4    4 2
12  1  1    4    1 4
13  1  1    4    2 2
14  1  1    5    2 1

Also, in ddply, we can just pass a vector of column names i.e. no need to create a single string

library(plyr)
ddply(df, names(df), nrow)
   vs am gear carb V1
1   0  0    3    2  4
2   0  0    3    3  3
3   0  0    3    4  5
4   0  1    4    4  2
5   0  1    5    2  1
6   0  1    5    4  1
7   0  1    5    6  1
8   0  1    5    8  1
9   1  0    3    1  3
10  1  0    4    2  2
11  1  0    4    4  2
12  1  1    4    1  4
13  1  1    4    2  2
14  1  1    5    2  1

Or if we are creating a single string from names, also paste the whole expression and then evaluate (which is not recommended as there are standard ways of dealing this)

eval(parse(text = paste('ddply(df, .(', toString(names(df)), '), nrow)')))
   vs am gear carb V1
1   0  0    3    2  4
2   0  0    3    3  3
3   0  0    3    4  5
4   0  1    4    4  2
5   0  1    5    2  1
6   0  1    5    4  1
7   0  1    5    6  1
8   0  1    5    8  1
9   1  0    3    1  3
10  1  0    4    2  2
11  1  0    4    4  2
12  1  1    4    1  4
13  1  1    4    2  2
14  1  1    5    2  1
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can use aggregate by grouping all the columns and counting it's length.

aggregate(1:nrow(df)~., df, length)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213