2

I have a data frame with n > 1000, in which each row includes data for the columns Year, which is a numeric year, and Gender, which is either "Male" or "Female". I want to compute a t-test for the proportion of Gender == "Male" pairwise comparisons between Years. I have succeeded in creating a plot of this proportion, for which I have attached the code and plot. I am unable to extend this to the prop_test() function. I can't attach my data, but code for a sample dataset is included.

sample_data <- as_tibble(data.frame(Gender = sample(c("Male", "Female"), 1000, replace = TRUE), 
                   Year = sample(c(2016, 2017, 2018, 2019), 1000, replace = TRUE)))

sample_data %>% 
  group_by(Year) %>% 
  summarise(prop = sum(Gender == "Male", na.rm = TRUE) / n()) %>% 
  ggplot(mapping = aes(x = Year, y = prop)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = round(prop, 2), vjust = -0.25)) +
  labs(y = "prop Male")

The resulting plot of proportions male by year:

img

Please advise on how I can modify my code that I used to generate the plot to compute a pairwise t-test for the proportions. I have tried methods like:

sample_data %>%
    prop_test(Gender ~ Year)

But this gives an error:

 Error in rowSums(x) : 'x' must be numeric
Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30
jp5602
  • 41
  • 5
  • Extend *what* exactly? Are you just asking how to do a t-test with this data? It doesn't actually seem related to ggplot—like a plot is a helpful way to visualize proportions, but not part of a question about how to actually do the test. If you just need a tutorial or reference on hypothesis testing in R, [here](http://www.r-tutor.com/elementary-statistics/inference-about-two-populations/comparison-two-population-proportions) is a quick one – camille Feb 20 '22 at 22:04
  • I have tried simple things like ```sample_data %>% prop_test(Gender ~ Year)```, but I get errors like "Error in rowSums(x) : 'x' must be numeric". EDIT: Oh, do I need to manually make a table like they say? I will try that – jp5602 Feb 20 '22 at 22:06
  • The link you gave works, but doesn't work with piping. Is there an alternative way to accomplish this with piping to reduce extraneous intermediate variables? – jp5602 Feb 20 '22 at 22:24
  • It would be clearer if you [edit] the post to include the code you're actually debugging for the test, instead of the code for the plot – camille Feb 20 '22 at 22:27
  • Try `prop_test(Year ~ Gender)`. – dcarlson Feb 21 '22 at 03:30
  • That gives the same error "Error in rowSums(x) : 'x' must be numeric" – jp5602 Feb 21 '22 at 18:10

0 Answers0