I have a dataframe with 117206 rows and 4 columns userid,itemid,rating and date. The structure of the dataframe is given below.
'data.frame': 117206 obs. of 4 variables:
$ userId: Factor w/ 19043 levels "1","2","3","4",..: 1 1 2 3 3 3 4 5 5 5 ...
$ itemId: Factor w/ 11451 levels "2844","4936",..: 7402 9729 3404 2976 7932 10035 11093 6718 8297 8537 ...
$ rating: int 7 8 10 8 8 7 10 2 7 5 ...
$ time : Date, format: "2013-04-03" "2013-04-21" "2013-09-18" ...
The head of the data is
userId itemId rating time
1 1 1074638 7 2013-04-03
2 1 1853728 8 2013-04-21
3 2 113277 10 2013-09-18
4 3 104257 8 2013-03-31
5 3 1259521 8 2013-03-24
6 3 1991245 7 2013-03-24
The tail of the data is
userId itemId rating time
117201 19041 2171867 3 2013-09-16
117202 19041 2357129 5 2013-09-21
117203 19041 2381931 4 2013-09-08
117204 19042 816711 8 2013-06-23
117205 19043 1559547 2 2013-07-08
117206 19043 2415464 2 2013-07-14
I am trying to make a histogram using ggplot and it does not seem to be working. There are a couple of problems which are stated below:
- The count on the y-axis are not correct
- x-axis labels are not displayed at all
I am using the following code to draw a histogram and I have used the same code to make a correct plot for a different data set of similar kind but with 100K rows.
First I have created x-axis labels
labels_mtweet = seq(1,length(unique(m_tweet$itemId)),by=600)
so I have labels from 1 to 11451.
ggplot(m_tweet)+geom_histogram(aes(x=itemId))+
scale_x_discrete(breaks=labels_mtweet, labels=as.character(labels_mtweet))+
labs(x="Movie Id", y = "Number of ratings per movie",
title = "Distribution of ratings per movie - MovieTweetings")
Above is the code I am using to draw a histogram. When i make a simple plot, the values are displayed correctly using table.
plot(table(m_tweet$itemId),xlab=("Movie Id"),ylab=("Frequency of Movie Rating"),
main=("Distribution of Ratings per movie - MovieLens"),type="l")
but when trying to get it done with ggplot. The bars are not of correct height and x-labels are not displayed at all.
I would like to paste the ggplot in here but for policy reasons I cant. Can anyone spot where things are going wrong?I think I am missing something in here that is causing the problem.
Any or all help will be greatly appreciated. I have not provided the output from 'dput' as it is very long.
Thanks.