Use R to count values in a number of different columns

Question

I have a dataset of patents, where I have recorded 1) the month and year associated with a patent renewal and 2) whether the patent holder chose to pay the patent fee or let the patent lapse. So

patentid        fee1date         fee1paid    fee2date    fee2paid
1               May 2010       True        May 2013    False 
2               May 2010       True        April 2014  True

What I want to do is develop a count of the number of renewals by month and by year, as well as the number of abandoned patents, as follows:

date      renewed              lapsed
May 2010  2                   0

How might I count the data that I have now? Thank you!

EDIT: The key point is to aggregate these across different columns. The issue that I am running into now is that when I try using the count library, it treats 2 renewals in May 2010 as two separate values.

Minor advice - try to avoid capitals and spaces in your column names — tjebo, Jul 16 '18 at 13:12
And - how do you get to `No Patents Renewed 11` from your example data frame? This looks like magic to me. Please explain! — tjebo, Jul 16 '18 at 13:13
how u realize that patent abandoned in 2010 ? you just know that abandon time is between 2010 and 2013 ! — Saman, Jul 16 '18 at 13:18
Sorry SaMi GiMiX, I don't quite understand your comment. Most patent offices release information about filing fees online, so it is possible to work backwards to see whether a patent has been renewed or not. — Krabo, Jul 16 '18 at 13:20
It seems that `lapsed` should be 1 because the `fee2paid` for line 1 is FALSE. Am I missing something? — akash87, Jul 16 '18 at 13:44
No, because I am looking at the number of renewal statistics by month. Because there is no lapse in May 2010, the count for that month is 0. — Krabo, Jul 16 '18 at 13:50

score 4 · Answer 1 · edited Jul 16 '18 at 15:58

4

Using dplyr

require(tidyr)
require(dplyr)      

data %>% gather(year,value, -Patent.ID) %>% 
         separate('year',c('Fee','N','Act')) %>% 
          spread(Act,value) %>% 
          unite(Fee, Fee,N, sep = '.') %>% 
          group_by(Date) %>% 
          summarise(R=sum(Paid=='True'), NotR=sum(Paid=='False'))

     # A tibble: 3 x 3
  Date           R  NotR
  <chr>      <int> <int>
1 April 2014     1     0
2 May 2010       2     0
3 May 2013       0     1

Data

data <- read.table(text="
               'Patent ID'      'Fee 1 Date'   'Fee 1 Paid'    'Fee 2 Date'   'Fee 2 Paid'
               1               'May 2010'       True        'May 2013'    False 
               2               'May 2010'       True        'April 2014'  True

               ",header=T, stringsAsFactors = F)

edited Jul 16 '18 at 15:58

tjebo

21,977
7
58
94

answered Jul 16 '18 at 13:50

A. Suliman

12,923
5
24
37

Hi A. Suliman, thanks for the help! I've tried your approach but unfortunately I get the following error message: `In addition: Warning message: attributes are not identical across measure variables; they will be dropped` – Krabo Jul 16 '18 at 14:32
Check https://stackoverflow.com/questions/28972386/retain-attributes-when-using-gather-from-tidyr-attributes-are-not-identical – A. Suliman Jul 16 '18 at 14:48
Eventually I got there, I used the plyr count and aggregate function to add each of the columns together. Thanks for your help though! – Krabo Jul 16 '18 at 22:55

Use R to count values in a number of different columns

1 Answers1

Data