0

So, I haven't worked with this large of a data set before (about 550 data entries) so things can get confusing. The script I am working with is rather long as it binds our "Pre" and "Post" data from participant questionnaires using the DASS scale.

In terms of merging data frames, summarizing, and interpreting the data, the script is working well. But, I need to update a PowerPoint with a bar graph also, not just the data frames and summaries.

I tried to use the native R function to make a bar graph, but things got messy and it kept saying that I could not subset data. So, instead I'm going to try to use ggplot. I'm not sure if it is easier or not as I am still running into problems.

My code looks like this:

ggplot(ChildALL, aes(x=Pre, Post, y=4:50))+geom_bar(stat="mean")+labs(x="Measure")

What I want to do is make a bar graph with both "Pre" and "Post" times on it (next to each other, probably in different colors). The y will be the average and the x will be three sets of data: Anxiety, Depression, and Stress all based on the already subscaled data from the questionnaire. I'm not sure if I can share the data with you as it is on a private server and contains identifying information, so some general advice would be helpful.

The error I am getting is "can't find a stat called mean". Which, ok it might need to be specified differently but I'm kind of stuck at this point and I may just go back to the native R barplot function since I was at least getting somewhere with that.

Other information that might help: the questionnaire is 68 items long. They are already grouped for their respective subscales and there are additional columns not being used in the graph such as "StartdateC", "familySIDC", and anything specifying male or female. The code for those names are here:

names(ChildALL) <- c("startdateC","familySIDC",
                     "SCAREDC1","SCAREDC2",  "SCAREDC3", "SCAREDC4", "SCAREDC5",
                     "SCAREDC6","SCAREDC7",  "SCAREDC8", "SCAREDC9", "SCAREDC10",
                     "SCAREDC11","SCAREDC12","SCAREDC13","SCAREDC14","SCAREDC15",
                     "SCAREDC16","SCAREDC17","SCAREDC18","SCAREDC19","SCAREDC20",
                     "SCAREDC21","SCAREDC22","SCAREDC23","SCAREDC24","SCAREDC25",
                     "SCAREDC26","SCAREDC27","SCAREDC28","SCAREDC29","SCAREDC30",
                     "SCAREDC31","SCAREDC32","SCAREDC33","SCAREDC34","SCAREDC35",
                     "SCAREDC36","SCAREDC37","SCAREDC38","SCAREDC39","SCAREDC40",
                     "SCAREDC41",
                     "CDIC1", "CDIC2", "CDIC3", "CDIC4", "CDIC5", "CDIC6", "CDIC7",
                     "CDIC8", "CDIC9", "CDIC10","CDIC11","CDIC12","CDIC13","CDIC14",
                     "CDIC15","CDIC16","CDIC17","CDIC18","CDIC19","CDIC20","CDIC21",
                     "CDIC22","CDIC23","CDIC24","CDIC25","CDIC26","CDIC27",
                     "Gender",
                     "FemalePDS1","FemalePDS2","FemalePDS3","FemalePDS4","FemalePDS5",
                     "FemalePDS6","FemalePDS7","FemalePDS8","FemalePDS9","FemalePDS10",
                     "MalePDS1","MalePDS2","MalePDS3","MalePDS4","MalePDS5",
                     "MalePDS6","MalePDS7","MalePDS8","time")

Hope some advice can come out of it. I think I can build this barplot if I play around with it long enough, but I just wanted to see what other people thought.

Thank you ahead of time.

EDIT: A previous graph a former colleague made: enter image description here

I'm not sure if he made in R or not as he left without discussing this issue with me. It could very well be that he used Excel or something and I'm wasting my time.

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
  • 2
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Don't try to share all your data, just a representative example that can be used for testing. You'll likely have to reshape your data before plotting to make it "tidy". Maybe something like this can help: https://stackoverflow.com/questions/48940156/ggplot-geom-bar-where-x-multiple-columns – MrFlick Apr 06 '20 at 19:34
  • 1
    Your colleague make the graph with R and `ggplot2`. If you can't share the data, can you generate a reproducible example (replacing private informations with letters or random words) to see how your data are organized ? Right now, it's pretty hard to keep the track of what your data looks like. – dc37 Apr 06 '20 at 19:34
  • @MrFlick I'm on it to create a simple example. I think I can easily make a copy of the data table without the identifying information which will also fix the issue of the data being on a server. I'll add it in and a sample of my data manipulation as an edit. This is a new script so it is very possible that the data isn't "tidy" –  Apr 06 '20 at 19:38

1 Answers1

0

Here is an example with mock data. You just have to get a summary of your data to make the barplot.

library(ggplot2)
library(dplyr)

ChildALL <- data.frame(
    Time = gl(2,200, labels=c("Pre", "Post")),
    Value=c(sample(0:10, 200, replace = TRUE),
            sample(0:10, 200, replace = TRUE, prob=c(rep(.1, 8),.08, .08, .04))),
    Type = sample(c("Anxiety", "Depression", "Stress"), 400, replace=TRUE),
    stringsAsFactors = TRUE)

ChildALL %>% group_by(Time, Type) %>%
    summarise(SD = sd(Value), Value = mean(Value)) %>% ungroup() %>% 
    ggplot(., aes(x=Type, y=Value,  fill = Time)) +
    geom_bar(stat="identity", position = "dodge") +
    geom_errorbar(aes(ymin= Value - SD, ymax = Value + SD, width=0.2), 
                  position=position_dodge(width=0.90)) + 
    labs(x="Measure")

Created on 2020-04-06 by the reprex package (v0.3.0)

user12728748
  • 8,106
  • 2
  • 9
  • 14