-4

I have data which look something like this.

company  date  auditor  change  count
A        2016  ZXY      0       1
A        2015  ZXY      0       2
A        2014  ZXY      0       3
A        2013  FPQ      1       4
A        2012  ZXY      1       5
B        2017  ERW      0       1
B        2016  ERW      0       2
B        2015  ERW      0       3
B        2014  ERW      0       4
B        2013  ERW      0       5
.
.
.
.

This data tells whether auditor has switched in last five year. If there is switch then change value is '1'. I want to calculate

1) Percentage of companies who had switch in last year (count=1).

2) Percentage of companies who had no switch in last five year (change=0 for count=1,2,3,4,5).

3) Percentage of companies who experienced change more than once in five year (change=1 for count= more than once)

I just want the logic of how to do it.

1 Answers1

2

I'd probably use dplyr to sum the change column:

changeSummary <- yourData %>%
  group_by(company) %>%
  summarise(sumChanges = sum(change))

That will give a data frame with each company listed once and a count of changes for each company. You can then pull percentages for any of your 3 criteria above easily enough. E.g. your first scenario (count == 1):

answer1 = length(filter(changeSummary, sumChanges == 1)) / length(sumChanges)
olorcain
  • 1,230
  • 1
  • 9
  • 12