After manipulating raw data we have obtained following data.frame
ItemID GroupID mentions
1 601 3 1
2 601 4 1
3 611 3 1
4 661 3 1
5 801 3 1
6 821 3 1
6 841 1 3
6 841 2 3
6 841 3 3
6 841 4 3
I have 10000 records like this and my first goal is to figure our items that represent all 4 GroupID. First I tried to do this visually by plotting.
ggplot(item.stats, aes(x=ItemID, y=mentions, fill=GroupID)) +
geom_bar(stat='identity', position='dodge')
With the large dataset this didn't look like a sensible thing. What's best way to get good idea of how many items represent all groups and mentions the mentions.
In above example after filtering it should only have:
ItemID GroupID mentions
6 841 1 3
6 841 2 3
6 841 3 3
6 841 4 3
Trying to get meaningful visualization:
test.with.id <- transform(test,id=as.numeric(factor(ItemID)))
ggplot(test.with.id, aes(x=id, y=mentions, fill=GroupID)) +
geom_histogram(stat='identity', position='stack', binwidth = 2)
May be similar to this How to plot multiple stacked histograms together in R?