0

I have a data frame in the following format:

           Person Answer Value
            John     Yes     3
            Pete     No      6
            Joan     Yes     5
            Joan     Yes     4     
            Pete     No      7

I want to conduct an analysis (and create a stacked bar plot), where I'm able to group by the Person (repeating) and Answer variables and then summarize by value.

I've tried using dplyr to perform this, but I'm running into issues. The values on which I'm trying to perform the function are hindered if I use a group_by clause in my dplyr piping.

e.g.,

df2 <- df %>%
select(Person, Answer, Value) %>%
group_by(Person, Answer) %>%
summarise(sum(value = 3)/length(original dataframe ungrouped) + sum(value = 6)/length(original dataframe ungrouped) 

The problem I'm running into is performing this calculation properly. The calculation doesn't make sense AFTER the data has been grouped, as I end up return a very limited dataframe after grouping.

Expected output:

    person answer value
    Joan   Yes.    calculated value (summary stat)
    Joan   No.     calculate value 
    John   Yes.    calculated value....
 ​   John   No
    Pete   Yes
    Pete   No

Ultimately, I'd like to make a stacked bar chart, where the summarization is shown across the People and the bars are divided into percentages by "yes" and "no" answers. For example, there are 3 bars: one for John, one for Pete, and one for Joan, and each of these bars is divided into two parts (values based on yes/no response)

Thanks!

  • 2
    Please show your expected output as well. what is 'var1', 'var2', 'var3' – akrun Aug 11 '21 at 23:47
  • 1
    Seems to be related to the previous question: https://stackoverflow.com/questions/68747743/replace-all-values-in-a-data-frame-conditionally We need more explanations about the context. – crestor Aug 12 '21 at 00:02
  • 2
    Looking at your previous question (https://stackoverflow.com/questions/68747743/replace-all-values-in-a-data-frame-conditionally) I can see that crestor, akrun and TarJae all put in effort to help you naiverat. I think it seems fair that you should spend some time learning [how to create a good question](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and include more details to help us help you (and avoid wasting our time) – jared_mamrot Aug 12 '21 at 00:09
  • You get logical values in R using `==`. You only use `=` for assignments. – IRTFM Aug 12 '21 at 00:10
  • 1
    I tried to provide more detail. In the future, I'll be sure to include better examples. Thanks for all the help. – cigarettes_after_text Aug 12 '21 at 00:12

1 Answers1

3

I don't understand what your desired outcome is; do either of these suit?

library(tidyverse)

df <- read.table(text = "Person Answer Value
 John     Yes     3
 Pete     No      6
 Joan     Yes     5
 John     No      6
 Pete     No      1", header = TRUE)

df2 <- df %>%
  group_by(Person) %>%
  mutate(proportion = Value / sum(Value))
df2
#> # A tibble: 5 x 4
#> # Groups:   Person [3]
#>   Person Answer Value proportion
#>   <chr>  <chr>  <int>      <dbl>
#> 1 John   Yes        3      0.333
#> 2 Pete   No         6      0.857
#> 3 Joan   Yes        5      1    
#> 4 John   No         6      0.667
#> 5 Pete   No         1      0.143

ggplot(df2, aes(x = Person, y = Value, fill = Answer)) +
  geom_col(color = "black", position = "stack") +
  geom_text(aes(label = Answer),
            position = position_stack(vjust = 0.5))

ggplot(df2, aes(x = Person, y = proportion, fill = Answer)) +
  geom_col(color = "black", position = "stack") +
  geom_text(aes(label = round(proportion, 2)),
            position = position_stack(vjust = 0.5))

Created on 2021-08-12 by the reprex package (v2.0.0)

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • 1
    This is a very good answer, considering the sparse information! – TarJae Aug 12 '21 at 00:10
  • Yes, similar to what I'm trying to do. To provide even more detail, for each bar, I want it divided into two political parties (which would be voting yes or no). Each person in the dataset is represented repeatedly, and each respondent provides a value as to how likely they are to support the candidate. The three person (candidates), in this case, are John, Joan, and Pete. Thanks for all the help! – cigarettes_after_text Aug 12 '21 at 00:17
  • Oh, that makes more sense - so the second plot in my answer is basically what you're trying to do? What further changes do you want to make? – jared_mamrot Aug 12 '21 at 00:20
  • 1
    @naiverat : No, no, no. Do NOT use comments to clarify problems with your question. Instead DO use SO's [edit] facilities to improve the examples in body of the question. – IRTFM Aug 12 '21 at 00:21
  • @jared_mamrot I think you've helped enough and I can take it from here. Thanks! – cigarettes_after_text Aug 12 '21 at 00:55
  • No problem - you will find a lot of help on stackoverflow if you create good questions - thanks for clarifying and improving your question in response to the comments :) – jared_mamrot Aug 12 '21 at 00:57