-3

I have a data frame like this:

CWSR = c(0.2, 0.1, 0.5, 0.6, 0.4, 0.8, 0.9, 0.7, 0.1, 0.2) 
BPA = c(1,5,9,8,4,3,2,1,4,3) 
df = data.frame(CWSR, BPA)

   CWSR BPA
1   0.2   1
2   0.1   1
3   0.5   9
4   0.1   2
5   0.4   4
6   0.1   2
7   0.9   2
8   0.1   3
9   0.1   2
10  0.2   3

I have generated the below histogram but instead of the layout produced, I want to display show a dot for each value in my graph, rather than an entire bar. Currently I am using this:

p <- ggplot(HData, aes(BPA, CWSR))
p + geom_bar(stat="identity")

enter image description here

In addition, I am looking to count the number of instances of BPA IF the value of CWSR is equal to 0.1, then to display this as a percentage.

So in the example above, for BPA value of 1, this occurs once for the value of CWSR 0.1 (line 2) so this would show 100%. For the BPA value of 2, this occurs 3 times for the value of CWSR 0.1 (lines 4,6,9) BUT there is a BPA value of 2 and CWSR value not equal to 1 (line 7, 0.9) so the total percentage would show as 75% (3 out of 4).

I have tried something like this:

df %>%
  group_by(BPA) %>% 
  summarise(num = n(),
            totalCWSR = sum(CWSR==1), totalP = (totalCWSR / num * 100)) 

But not sure if it is correct or how to display in ggplot?

I hope this is clearer, apologies for not providing as much detail previously.

Matt
  • 81
  • 1
  • 8
  • 3
    Let's see the data, please copy/paste a minimal, reproducible example with `dput`. – tyluRp Dec 31 '17 at 00:56
  • I am not sure how to do that. Can you please explain the steps involved? – Matt Dec 31 '17 at 01:02
  • 2
    you were provided instructions for how to include code and data both on question open and in the R FAQ. add `stat="count"` to your (non-existent in question) `geom_point()` call. – hrbrmstr Dec 31 '17 at 01:07
  • See [how to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MrFlick Dec 31 '17 at 01:10
  • I have added an example of a data frame in the original post. – Matt Dec 31 '17 at 01:33

2 Answers2

2

I'm not sure this is what you want, and there is probably a much more elegant way of doing this, but here you have one possible solution that aggregates your values in CWSR given the BPA values, and then does a simple geom_point plot:

library(dplyr)
library(ggplot2)

CWSR = c(0.2, 0.1, 0.5, 0.6, 0.4, 0.8, 0.9, 0.7, 0.1, 0.2) 
BPA = c(1,5,9,8,4,3,2,1,4,3) 
df = data.frame(CWSR, BPA)

df %>% group_by(BPA) %>%
  summarise(CWSRsum = sum(CWSR)) %>%
  arrange(BPA) %>%
  ggplot(aes(BPA, CWSRsum)) + geom_point(size=5)

enter image description here

Oriol Mirosa
  • 2,756
  • 1
  • 13
  • 15
  • many thanks. That is almost what I need. – Matt Dec 31 '17 at 02:24
  • How can I count BPA if CWSR = 1? I know in the data frame above I only have values of <1, but if my values were say 1,2,4,5,1,2,1,5,3,1, etc. how could I count BPA if these values of CWSR = 1? – Matt Dec 31 '17 at 02:25
  • 1
    I’m confused. I thought you were adding CWSR for each value of BPA, not the other way around. That’s why the `group_by` uses BPA as an argument. What are you trying to do? – Oriol Mirosa Dec 31 '17 at 02:36
  • it works for those larger than 1, but I only need to count those that are equal to 1. So if the values for CWSR are 1,1,4,5,1,1,3 and the values for BPA are 1,3,5,2,2,3,3 then BPA 1 had 1 value of 1 so count this, BPA 2 had 1 value of 1 so count this, BPA 3 had a value of 1 2 times, so total should show 2, etc. Hope that makes sense? – Matt Dec 31 '17 at 02:38
  • Oh, I see. Then I think it would suffice to just add a `filter` before the `group_by`, like this: `df %>% filter(CWSR == 0.1) %>% group_by(BPA) %>%` and the rest. Does it work like this? – Oriol Mirosa Dec 31 '17 at 02:43
  • ok thanks so much. I added but got this error : Error in df(.) : argument "df1" is missing, with no default – Matt Dec 31 '17 at 02:47
  • Are you using the correct names for the data frame and columns? I was using `df` because that’s what you used in your example data frame, but I think your original dataset was called `HData`. Or is the problem somewhere else? – Oriol Mirosa Dec 31 '17 at 02:52
  • I was using too much of the code, so working fine now, thank you so much for your help. There is just one more issue that I need to resolve. How can I show a % of BPA based on the count of CWSR? E.g. in the above, BPA 1 had a value (CWSR) of 1, and only occurred once in BPA, so would show 100%, BPA 2 had a value (CWSR) of 1, but occurred twice in BPA list, so would show 1/2 = 50%, and so on. Hope that makes sense? – Matt Dec 31 '17 at 03:07
  • I'm not quite following what you mean here. When you say BPA 1 or BPA 2, do you mean the index or the value? Either case, I don't see what you describe in the data (I'm assuming that you're using your example with BPA as `1,3,5,2,2,3,3` and CWSR as `1,1,4,5,1,1,3`. And when you say 'only occurred once in BPA', what value are you referring to? If you can clarify this, I can try to help. – Oriol Mirosa Dec 31 '17 at 18:05
  • @ Oriol - ok apologies for not explaining thoroughly. I have edited my original post to explain further. I am hoping this is clearer but if not, please let me know. – Matt Dec 31 '17 at 22:15
2

If for your plot you are looking for a 'lollipop plot' where the dot is placed on the end of a simple line instead of the whole bar, there are many ways to achieve it. Below is a minor modification to @Oriol's code example with a line segment added, though to be effective such plots are normally ordered from the largest to the smallest value:

df %>% 
  group_by(BPA) %>%
  summarise(CWSRsum = sum(CWSR)) %>%
  arrange(BPA) %>%
  ggplot(aes(BPA, CWSRsum)) + 
  geom_segment(aes(xend = BPA, y = 0, yend = CWSRsum), colour = "gray40", linetype = 3)+
  geom_point(size=4, colour = "gray60") +
  theme_bw()+
  theme(panel.grid = element_blank())

lollipop plot

Stewart Ross
  • 1,034
  • 1
  • 8
  • 10