0

Im trying to get a figure of different variables (strategy) in to age groups (young vs old) in two gender group (female and male). I have a data of these variables. I uploaded them in R, and created a dataframe for them following this format:

crosstab <- xtabs(~strategy+age+gender, data = data2)
crosstab

tabframe <- as.data.frame(crosstab)
tabframe

so i have 20 observations in 4 variables (strategy, age, gender, Frequency).

Now when I plot it following the below code, I got a figure for 4% not 100%! and y is showing up with (counts) not (percentage) as Im expecting it.

rf <- ggplot(data = data2) +
  geom_bar(
    mapping = aes(x = age,
                  fill =strategy,
    ),
    position = position_dodge2(width = 0.9, preserve = "single"),
    colour="black"
  ) +
  scale_y_continuous(labels = percent_format())+
  facet_grid(~gender, labeller = labeller(gender = saud_gender_names, task = saud_task_names)) +
  scale_fill_manual(values=cbPalette)+
  scale_colour_manual(values=cbPalette)+
  theme_bw() + 
  scale_x_discrete(labels=c("younger", "older")) +
  ggtitle("contour choice in ynqs")

rf

This is the photo I got

plot1

Can you help me solving this problem?

Best

I have tried to change many dataframe but it doesn't work

neilfws
  • 32,751
  • 5
  • 50
  • 63
jojo
  • 1
  • 1
  • Welcome to Stack Overflow. Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including a small representative dataset in a plain text format - for example the output from `dput(data2)`, if that is not too large. – neilfws Mar 27 '23 at 23:51
  • Hi, it is really too large to share it as dput(data2). I can explain further if there is a need – jojo Mar 28 '23 at 00:36
  • 3
    You can always supply a smaller, representative dataset. It really is difficult to help if we cannot reproduce the issue using some data. Linking to e.g. a shared Google sheet is another option. – neilfws Mar 28 '23 at 00:38
  • dput(tabframe[1:4, ]) structure(list(strategy = structure(1:4, .Label = c("fall", "fall-rise", "level", "rise", "rise-fall"), class = "factor"), age = structure(c(1L, 1L, 1L, 1L), .Label = c("o", "y"), class = "factor"), gender = structure(c(1L, 1L, 1L, 1L), .Label = c("f", "m"), class = "factor"), Freq = c(9L, 0L, 3L, 14L)), row.names = c(NA, 4L), class = "data.frame") > – jojo Mar 28 '23 at 01:58

1 Answers1

0

I've tried to re-create the issue you're seeing by running your code using the reprex addin in RStudio. Producing a reprex that accurately reproduces an issue is key to finding and fixing the problem.

When I run the code you posted, I get various errors, e.g. because the code you've posted does not create the cbPallete object. Once I fixed those issues and removed lines of code that weren't relevant to the issue (e.g. lines that set the title, etc.), the following reprex was as close as I can get to the chart you posted the image of.

The reason why all the bars show 100% on the y axis is that there is only one observation in each combination of strategy, age and gender. Since geom_bar() counts observations, only one row having each unique combination of variables means geom_bar() produces a count of 1 and percent_format() renders that as 100%.

I can't tell if this problem happens because you've only provided a small number of rows of data in your post, or whether it's an issue with either (a) the actual data you're using or (b) the combination of variables you've chosen to aggregate(). Either way, it seems the problem is in the data rather than in your code -- the chart is rendering the data exactly as you've specified.

library(ggplot2)
library(scales)

data2 <- structure(
  list(
    strategy = structure(1:4, .Label = c("fall", "fall-rise", "level", "rise", "rise-fall"), class = "factor"), 
    age = structure(c(1L, 1L, 1L, 1L), .Label = c("o", "y"), class = "factor"), 
    gender = structure(c(1L, 1L, 1L, 1L), .Label = c("f", "m"), class = "factor"), 
    Freq = c(9L, 0L, 3L, 14L)
  ), 
  row.names = c(NA, 4L), 
  class = "data.frame"
)

tabframe <- as.data.frame(xtabs(~strategy+age+gender, data = data2))

# `tabframe` contains only a single row with each combination of variables
dplyr::count(tabframe, strategy, age, gender)

ggplot(data = tabframe) +
  geom_bar(
    mapping = aes(x = age, fill = strategy),
    position = position_dodge2(width = 0.9, preserve = "single")
  ) +
  scale_y_continuous(labels = percent_format()) +
  facet_grid(cols = vars(gender))

Created on 2023-03-28 with reprex v2.0.2

If you still need help, it would probably be useful to post a larger sample of your data (maybe 20 rows?) using the dpasta() function from the datapasta package, since that would allow us to rule out any problems with your data more easily.

Matt Ashby
  • 324
  • 1
  • 6
  • Thank you Matt, The same issue is here, the plot doesn't show variation, it shows 100% for all strategies! It also show 'count' in y, not precentage – jojo Mar 28 '23 at 09:54
  • @jojo I've updated my answer with some more information based on your comment. If that doesn't help, please post more of your data so I can check if that's what's causing the problem. – Matt Ashby Mar 28 '23 at 14:59