0

I'm working in Rstudio.

With ggplot2, I'm trying to form a plot where I have frequencies of a categorical variable (number of shares purchased), per category (there are 5 categories). For example, members of category A might buy 1 share more frequently than members of category D.

I now have a count plot. However, because one category is much bigger than the others, you don't get a good idea about the n shares in the other categories.

The code of the count plot is as follows:

#ABS. DISTRIBUTION SHARES/CATEGORY
ggplot(dat, aes(x=Number_share, fill=category)) +
  geom_histogram(binwidth=.5, alpha=.5, position="dodge")

This results in this graph: https://i.stack.imgur.com/QRyx6.jpg

Therefore, I am planning to make a plot where, instead of an absolute count, you have a distribution relative to their category.

I calculated the relative frequencies of each category:

library(MASS)
categories = dat$category
categories.freq = table(categories)
categories.relfreq = categories.freq / nrow(dat)
cbind(categories.relfreq)

categories.relfreq

Beauvent 1 0.002708692

Beauvent 2 0.015020931

E&B 0.037182960

Ecopower 1 0.042107855

Ecopower 2 0.029549372

Ecopower 3 0.873183945

I don't know how to make a plot where the frequency of a share number acquisition is relative to the category, instead of absolute. Can anybody help me with this?

Machavity
  • 30,841
  • 27
  • 92
  • 100
maria118code
  • 153
  • 1
  • 14

2 Answers2

0

I think what you are looking for is this

ggplot(dat, aes(x=Number_share, fill=category)) +
  geom_bar(position="fill")

This will stack the categories on top of each other and the position="fill" argument will give the relative counts

see24
  • 1,097
  • 10
  • 21
  • Thanks for your help, but this is not what I'm looking for. I don't want each n_share category to add up to 100%. Instead, I'd like the total of shares acquired by each member category to add up to 100%. Your answer leads to: https://ibb.co/dcgxTS Instead I'm looking for a relative version of this absolute graph: https://imgur.com/a/e4k94 – maria118code Mar 07 '18 at 15:29
0

I found that this problem is very similar: Histogram with weights in R basically it's because the default of a histogram is to use counts on the y-axis, while I want to use a hist(freq=TRUE), or in the case of ggplot: ggplot_histogram(y= ..density..).

maria118code
  • 153
  • 1
  • 14