Create density plots out of aggregated data

Question

I have a data frame with 3 columns of aggregated data: CreditScore, Count, Month.

So one row with 550, 3, 3 would mean there were 3 people with 550 credit score in march.

I'm trying to create density plots that overlay to compare the differences in credit distributions between two months.

I feel like this should be really simple but can't find anything on google.

Trying to do this in R.

Any suggestions are appreciated.

Data example:

structure(list(CrScore = c(0L, 2L, 3L, 530L, 535L, 544L, 549L, 
551L, 554L, 558L, 560L, 561L, 563L, 565L, 567L, 568L, 569L, 577L, 
579L, 580L), Count.of.MFSAccount = c(2L, 9L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 1L, 1L, 3L, 1L, 1L, 2L, 1L, 2L, 1L, 1L), EnterDate.Month = structure(c(17136, 
17136, 17136, 17136, 17136, 17136, 17136, 17136, 17136, 17136, 
17136, 17136, 17136, 17136, 17136, 17136, 17136, 17136, 17136, 
17136), class = "Date")), .Names = c("CrScore", "Count.of.MFSAccount", 
"EnterDate.Month"), row.names = c(10L, 28L, 42L, 80L, 113L, 174L, 
212L, 231L, 259L, 299L, 320L, 331L, 359L, 382L, 409L, 421L, 432L, 
540L, 573L, 593L), class = "data.frame")

I'm sure you'll get some great help if you [make a reproducible example](http://stackoverflow.com/q/5963269/903061) and outline anything you've tried. — Gregor Thomas, Jan 10 '17 at 22:34
are you willing to disaggregate your data, i.e. replicate each value the requisite number of times? If you don't have a huge data set or need super-efficiency, that's probably the easiest way ... — Ben Bolker, Jan 10 '17 at 22:37
Disaggregating the data was going to be my last resort, I figured there must be a way to do this with aggregated data? the idea seems so simple. — Justin Leonard, Jan 10 '17 at 22:41
There's not really an easy way with base functions. Densities are usually done for continuous random variables where it usually isn't possible to aggregate without a loss of information. — MrFlick, Jan 10 '17 at 22:42

Thales · Accepted Answer · 2017-01-10T23:08:14.363

4

With ggplot2 using normalized version of Count.of.MFSAccount as weights:

library(ggplot2)
library(dplyr)

# Create weights that are normalized within each date
df <- df %>%
        group_by(EnterDate.Month) %>%
        mutate(w = Count.of.MFSAccount / sum(Count.of.MFSAccount))

# Plot with constructed weights
ggplot(df, aes(CrScore, weight=w, color=factor(EnterDate.Month))) + geom_density()

edited Jan 10 '17 at 23:08

answered Jan 10 '17 at 22:59

Thales

585
3
9

Create density plots out of aggregated data

1 Answers1