0

I am not able to plot for the aforesaid question on binary outcome.

Let

data=data.frame(feature1=c(rep(1,10),rep(2,10),rep(3,10)),
                feature2=c(rep(letters[1:2],15)),
                 Outcome=sample(0:1,30,replace = T))

ggplot(data,aes(feature1,Outcome)) + 
  geom_point() + 
  geom_smooth(method = 'glm',method.args=list(family='binomial')) + 
  facet_wrap(~feature2)

here I get only points at 1s and 0s, But I want points at probabilities by non-parametric model.(i.e (Outcome==1)/(all Outcomes(0s and 1s) for a particular 'feature1' segregated by 'feature2')

I know I can form a column for required probabilities from NonParametric model, but it will be very tedious to do for all combination of 'facet_wrap' and 'aes'

Roman
  • 17,008
  • 3
  • 36
  • 49
Hemant Rupani
  • 145
  • 10
  • I think my question is on 'Data Visualisation' - why putted on hold as off-topic? – Hemant Rupani Jul 04 '17 at 16:29
  • For exactly the reason given: it is solely about programming. – whuber Jul 04 '17 at 18:37
  • @whuber can you, please, move this to 'Stack Overflow'. – Hemant Rupani Jul 04 '17 at 18:39
  • how would your expected output column look like? – Roman Jul 05 '17 at 14:57
  • I'd recommend fitting a model outside of `ggplot2`. It's primarily a graphics package - use if for the graphing, not for the model fitting. There's no reason it needs to be tedious to assemble all combinations, of variables, just use `expand.grid`. [See here for an example](https://stackoverflow.com/a/11388371/903061) – Gregor Thomas Jul 05 '17 at 15:01
  • Also, just as a note, a binomial GLM is *not* a non-parametric model. – Gregor Thomas Jul 05 '17 at 15:04
  • @Jimbou my expected column mean(Outcome) per 'feature 1'(numeric) segregated by 'feature 2'(factor) – Hemant Rupani Jul 05 '17 at 19:30
  • @Gregor I took GLM as parametric and "(Outcome==1)/(all Outcomes(0s and 1s)" as non-parametric. expand.grid not fulfilling or maybe I can't utilize. – Hemant Rupani Jul 05 '17 at 19:30
  • 1
    So you want to use the mean of `Outcome` at each facet/x-value? Just calculate the means: with `dplyr`: `group_by(data, feature1, feature2) %>% summarize(mean = mean(outcome))`. Lots of methods and examples at the [R-FAQ average data by group](https://stackoverflow.com/q/11562656/903061). You could probably also use `stat_summary(geom = "point", fun.y = mean)`. Is this what you're asking? If so, why the `glm` stuff in your question? – Gregor Thomas Jul 05 '17 at 19:34
  • @Gregor 'stat_summary' is fulfilling my requirement, I used GLM to see how the GLM is fitting towards proportion of binary outcome... just like we do fit model for regression Outcome, I tried to visualise for proportion of binary outcome. Thank you very much! Please write an answer, because online information is not available for that case. – Hemant Rupani Jul 06 '17 at 07:33
  • Okay. In the future, please make your questions **minimal** - the GLM is great for you to do on your plot, but in this question it just confuses the issue. – Gregor Thomas Jul 06 '17 at 15:37

1 Answers1

1

Rather than "non-parametric model", I would call (Outcome==1)/(all Outcomes(0s and 1s) the mean. We can use stat_summary to summarize data with an arbitrary summary function, like mean(). In this case, I think you want

stat_summary(geom = "point", fun.y = mean)

Certain geoms, like geom_boxplot, are just clever uses of stat_summary.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294