0

I want to design a grouped barplot for data arranged as follows

       sx1pre sx1post sx2pre sx2post   
1         1     1       1       0
2         1     0       1       0  
3         0     1       1       0 
4         1     0       0       1
5         1     0       1       0
6         1     0       1       0 

I want to compare for each sx (1 or 2) the frequency of"pre" and "post", in a single graph. I would like to graphically represent the percentages of patients showing one symptom (sx) before the operation (pre) , over the total, versus the the ones that show the same symptom after (post). Thanks

statn00b
  • 3
  • 1
  • welcome to stack overflow :-) please specify a bit what you mean. Each row is one patient? Perhaps you could add a sketch (by paint or similar) of the graph you would expect from the data you provided. Please also read this post: https://stackoverflow.com/a/5963610/1842673 – TobiO Jul 23 '19 at 13:02
  • Thanks for the insight, I will work on a reproducible example. Returning to your question, yes, each patient represents a row, and for each the presence/absence of a symptom is recorded before and after treatment – statn00b Jul 23 '19 at 13:17
  • Please add a sample of the expected plot if possible. – NelsonGon Jul 23 '19 at 13:35
  • Here's an idea of what I imagine - [link](https://imgur.com/H7KhUxg) – statn00b Jul 23 '19 at 13:47

1 Answers1

0

Reading it again, I think I know, what you want to achieve. I guess you have the data already in R?

df=read.delim("temp.csv") #data is now in df

frequencies=data.frame(lapply(df,FUN=function(x){sum(x)/length(x)})) #calculate percentages

frequencies=data.frame(t(frequencies)) #make long form of data frame

names(frequencies)="percentage" #rename column
frequencies$category=row.names(frequencies) #get "proper" metadata
frequencies$timepoint=ifelse(grepl("pre",frequencies$category),"pre","post") #get timepoint
frequencies$intervention=ifelse(grepl("sx1",frequencies$category),"sx1","sx2") #get intervention type

#plot
ggplot(frequencies,aes(x=intervention,y=percentage,fill=timepoint))+
  geom_col(position=position_dodge())

Regarding the disease-conditions, it might be easier to use something like this:

new_names_after_comment=c('PAIN.PO','DYSPNEA.PO','PAIN.FU','DYSPNEA.FU')
frequencies$category_new=new_names_after_comment #just add as a new column

library(tidyr)
frequencies=frequencies %>% 
    separate(category_new,into=c("Disease","Timepoint"),sep="\\.",remove = F)

#plot after comment

ggplot(frequencies, aes(x=Disease,y=percentage,fill=Timepoint))+
  geom_col(position = position_dodge())

enter image description here

TobiO
  • 1,335
  • 1
  • 9
  • 24
  • Thanks for your kindness Tobi, I also uploaded a paint sketch of what I had in mind. Gonna try your solution in the meantime! Yes, indeed I already have my data in R. What would x represent in the function? – statn00b Jul 23 '19 at 13:49
  • So the `df` would be the data frame your data is in. The `x` is just a temporal variable so to speak. Only the function sees it. It will be the contents of each column, because `lapply` takes each column one by one and passes it to the function. – TobiO Jul 23 '19 at 13:58
  • Tobi, your assistance is invaluable. Many thanks. I also discovered the wonders of "grepl", which I didnt know as a function being fairly unskilled in R. Since I have more than two "sx", could I substitute the ifelse statement with something like case_when? – statn00b Jul 23 '19 at 14:15
  • in this case I would use `gsub` or `str_replace` to directly read the different types out of the categories. For that you would likely need to know regular expressions. If you give me the full list of categories, I could conjure something for you – TobiO Jul 23 '19 at 14:25
  • 'PAIN.PO','PIROSI.PO','DISFAGIA.PO','DYSPNEA.PO','VOMIT.PO','PAIN.FU','PIROSIS.FU','DISFAGIA.FU','DISPNEA.FU','VOMIT.FU' . PO & FU respectively indicate pre and post, while each symptom is identified by its name – statn00b Jul 23 '19 at 14:30
  • The solution I found, works as a charm. Thanks! frequencies <-frequencies%>% mutate (symptom = case_when(grepl("PAIN", frequencies$category)~'PAIN', grepl("PIROSI", frequencies$category)~'PIROSI', grepl("DISFAGIA",frequencies$category)~'DISFAGIA', grepl("DISPNEA", frequencies$category)~'DISPNEA', grepl("VOMIT", frequencies$category)~'VOMIT')) – statn00b Jul 23 '19 at 17:49
  • with mutate et al you can usually leave out all the `frequencies$` and just write the column (aka variable) name. When this all aswered your current question, please accept the answer. – TobiO Jul 23 '19 at 19:18