(Rewrite. Very new to R, sorry if my jargon is off below!)
Goal: Summarize values in a dataset, create a column with a total of values selected in the summary, then create an average based on that total against the total in the original, summarized dataset.
Desired Output: nonStandardActivationsSummary
year subCount institutions percentOfAllInYear
2017 2 1 .33
2018 1 1 .33
Starting data: fullData
pid startDate subLength
4484 2017-01-30 365
4487 2017-01-01 25
4487 2017-07-01 360
6246 2018-04-29 345
4485 2018-02-01 30
4486 2018-07-01 730
What I'm trying
1. Create vector that filters all but non-standard subscription lengths (works)
nonStandardActivations <- filter(fullData, !is.na(subLength) & subLength != 30 & (subLength%%365) != 0)
Result: nonStandardActivations (good so far)
pid startDate subLength
4487 2017-01-01 25
4487 2017-07-01 360
6246 2018-04-29 345
- Create summary of the nonstandard subscriptions, with an added column that gives number of non-standard in a year as % of total in original dataset, for that year only. (doesn't work)
nonStandardActivationsSummary <- summarize(
group_by(nonStandardActivations, year = format(startDate,'%Y')),
subCount = n(),
institutions=length(unique(pid)),
percentOfAllInYear = (length(unique(pid)) /
length(unique(filter(fullData,
format(startDate, '%Y'))$pid))))
The above gives me: "Error: Argument 2 filter condition does not evaluate to a logical vector". If I remove the percentOfAllInYear clause, it works fine but I don't get that last column.
I suspect I have a totally off approach. Or am lost in how vectors get used in a function chain. Help?