0

I have dataset of regional patent. I want to count where how many Appln_id has more than one Person_id and how many Apply_id has only one Person_id.

Appln_id 3 3 3 10 10 10 10 2 4 4
Person_id 23 22 24 49 50 55 51 101 122 104

here Appln_id 3 has three different person_id (23,22,24) and Appln_id 2 has only one Person_id(101). So, I want to count them that how many of Appln_id has more than one Person_id and how many Apply_id has only one Person_id

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks for your help. Can you tell me how to calculate the occurance rate, i mean the the percentage of occurance (of n). then I want to show them in a barplot – MD ABDUS SATTAR MOIN Feb 24 '21 at 04:17

2 Answers2

0

Count number of unique person for each Appln_id.

library(dplyr)
result <- df %>% group_by(Appln_id) %>% summarise(n = n_distinct(Person_id))
result

#  Appln_id     n
#*    <int> <int>
#1        2     1
#2        3     3
#3        4     2
#4       10     4

Now you can count how many of them have only 1 Person_id and how many of them have more than that.

sum(result$n == 1)
#[1] 1

sum(result$n > 1)
#[1] 3

data

df <- structure(list(Appln_id = c(3L, 3L, 3L, 10L, 10L, 10L, 10L, 2L, 
4L, 4L), Person_id = c(23L, 22L, 24L, 49L, 50L, 55L, 51L, 101L, 
122L, 104L)), class = "data.frame", row.names = c(NA, -10L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use data.table

library(data.table)
setDT(df)[, .(n = uniqueN(Person_id)), by = Appln_id]
akrun
  • 874,273
  • 37
  • 540
  • 662