0

I've searched this topic and found partial solutions, but still stuck on one aspect. I am trying to manipulate a piped summarise table from dplyr so that I can plot it out with ggplot. The data comes from survey research. I wish to report a response variable (which is a mean statistic) by ethnicity. In our survey research, we determine ethnicity through two questions. The first question is about Hispanic origin (it codes as a factor variable of "Hispanic" or "Non-Hispanic". The second question is about race/ethnicity (it also codes as a factor variable with options being caucasian, african-american, asian, and other. We do it this way because there is often crossover that can lead to under-reporting.

For reporting purposes, we wish to add Hispanic to the table and plot. I figured out how to do this with dplyr, but I can't reorder the way the rows are presented. Every attempt to refactor hasn't worked. Even forcats options like fct_relevel and fct_recode have not worked.

Here's the relevant code:

# Generate some random anonymous data
dd.scrub <- data.frame(matrix(NA, ncol = 3, nrow = 100))
names(dd.scrub) <- c("Ethnicity", "Hispanic", "Attachment.base")
ethnicities <- c("Caucasian", "AA", "Asian", "Other")
hispanic_origin <- c("Hispanic", "Non-Hispanic")
set.seed(40769)
dd.scrub$Ethnicity <- factor(floor(runif(100, min=1, max=5)),
                         levels = c(1:4),
                         labels = ethnicities)
dd.scrub$Hispanic <- factor(sample(hispanic_origin, 
                               size = 100, 
                               replace = TRUE,
                               prob=c(0.2, 0.8)))
dd.scrub$Attachment.base <- rnorm(100, mean = 26.8, sd=7.921)

# By ethnicity including Hispanic origin (HHI + Hispanic?)
attachment.ethnicity <- dd.scrub %>% filter(!is.na(Ethnicity)) %>%
group_by(Ethnicity)
attachment.ethnicity.sum <- summarise(attachment.ethnicity, 
Attachment = mean(Attachment.base))

# Ethnicty + hispanic
library(forcats)
library(questionr)
attachment.hispanic.sum <- dd.scrub %>% 
filter(Hispanic == "Hispanic") %>% 
    summarise(Attachment = mean(Attachment.base))
fct_expand(attachment.ethnicity.sum$Ethnicity, "Hispanic")
attachment.ethnicity.sum <- bind_rows(attachment.ethnicity.sum, attachment.hispanic.sum)
attachment.ethnicity.sum$Ethnicity <- addNAstr(attachment.ethnicity.sum$Ethnicity, value = "Hispanic")

The resulting table is:

# A tibble: 5 × 2
  Ethnicity Attachment
     <fctr>      <dbl>
1 Caucasian   27.01052
2        AA   29.62579
3     Asian   26.38861
4     Other   26.75793
5  Hispanic   27.57609

This succeeds at getting me a tibble that I can plot. But the arbitrary ordering of the rows, with Hispanic coming after Other, is rather odd.

Any help is greatly appreciated!

Larry V
  • 143
  • 6
  • Welcome to Stack Overflow! Please include a [minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) in your question and make your data available (for example by using `dput()`) so that others can reproduce your problem. – Lamia Jun 02 '17 at 17:38
  • Thanks for this! I will update with an example. My data is bound by an NDA, so can't share publicly. But will provide some dummy data to illustrate. – Larry V Jun 02 '17 at 18:06
  • Do you just want to reorder the rows of `attachment.ethnicity.sum` ? – Lamia Jun 02 '17 at 19:24
  • Yes. It is odd that Hispanic would fall below Other. In a perfect world, they would be reordered in alphabetical order. – Larry V Jun 02 '17 at 20:56
  • You're the one who actually defines this order of factor levels, when you define `ethnicities <- c("Caucasian", "AA", "Asian", "Other")` :) Try changing this order and you'll change the result. Then with the `fct_expand`, "Hispanic" is added as the last factor level. Anyway, if you want to change the order at the end, just do `levels(attachment.ethnicity.sum$Ethnicity)=c("AA", "Asian", "Caucasian", "Hispanic","Other")`. – Lamia Jun 02 '17 at 21:12

0 Answers0