3

I have a file having two different categories, and most of them are in one category. The categories are : in and out.

file1_ggplot.txt

status scores
in     44
in     55
out    12
out    23
out    99
out    13

To plot the density distribution, I am using this code, but I want to add a summary of categories and the lines with has in:

library(data.table)
library(ggplot2)
library(plyr)
filenames <- list.files("./scores",pattern="*ggplot.txt", full.names=TRUE)
pdf("plot.pdf")
for(file in filenames){
     library(tools)
     bases <- file_path_sans_ext(file)
     data1 <- fread(file)
     cdat <- ddply(data1, "status", summarise, scores.mean=mean(scores))
     data1ggplot <- ggplot(data1, aes(x=scores, colour=status)) + geom_density() + geom_vline(data=cdat, aes(xintercept=scores.mean, colour=status), linetype="dashed", size=1)
     print(data1ggplot + ggtitle(basename(bases)))

    }
dev.off()

Which outpus: ggplot for two categories

I want to add a box, which has the lines of in :

in     44
in     55

And,

> summary(data1$scores)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  12.00   15.50   33.50   41.00   52.25   99.00 

For this, I am trying to use the tableGrob:

data1ggplot <- ggplot(data1, aes(x=scores, colour=status)) + geom_density() + geom_vline(data=cdat, aes(xintercept=scores.mean, colour=status), linetype="dashed", size=1) +  annotation_custom(tableGrob(summary(data1$scores))

ggplot2.2

But it gives the same plot above which only has the numbers of summary.

Then, I have grepped the lines with in.

cat file1_ggplot.txt | grep -w "in" > only-in.txt

Then in R:

data2<-fread("only-in.txt")

trs <- as.data.frame(t(data2))
trs
       V1 V2
    V1 in in
    V2 44 55
data1ggplot <- ggplot(data1, aes(x=scores, colour=status)) + geom_density() + geom_vline(data=cdat, aes(xintercept=scores.mean, colour=status), linetype="dashed", size=1) +  annotation_custom(tableGrob(trs))

And it outputs in: ggplot2.3

What can I do to see these tables properly next to the plot, and for the lines with in without first using grep in bash?

bapors
  • 887
  • 9
  • 26
  • 1
    Could you provide a reproducible example, with a minimal dataset, see: [https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – bVa Mar 22 '18 at 11:17
  • But I already did..? – bapors Mar 22 '18 at 12:15
  • 1
    The dataset is here, but the example is not reproducible at the moment. I needed some modifications here `filenames <- list.files("./scores",pattern="*ggplot.txt", full.names=TRUE)`, becoming `filenames <- list.files(pattern="*ggplot.txt", full.names=TRUE) `. – bVa Mar 22 '18 at 13:17
  • By the way, I don't clearly understand the output you want, a table with `in` and `summary`, with legends for the summary, ... ? – bVa Mar 22 '18 at 13:26
  • @bVa `./scores` is the directory where you put your input file for `ggplot` ( in this case, `file1_ggplot.txt` ) . Yes indeed! A ggplot with the lines having `in` , and `summary` – bapors Mar 22 '18 at 14:04
  • What about the format of the table ? – bVa Mar 23 '18 at 08:12

1 Answers1

2

Here is a solution, with hypothesis on the format of the table you want:

enter image description here

Individual plot

library(tidyverse)
library(gridExtra) # tableGrob
library(broom) # glance

df_summary <- t(broom::glance(summary(data1$scores)))
data1 %>%
  ggplot(., aes(x = scores, colour = status)) + 
  geom_density() + 
  geom_vline(data = . %>% 
               group_by(status) %>%
               summarise(scores.mean = mean(scores)), 
             aes(xintercept = scores.mean, colour = status), 
             linetype = "dashed", 
             size = 1) +
  annotation_custom(tableGrob(rbind(data.frame(data1 %>% filter(status == "in") %>% rename(var = status, val = scores)),
                                    data.frame(var = row.names(df_summary), val = df_summary, row.names = NULL)), 
                                    rows = NULL, cols = NULL),
                    xmin = 60, xmax = 100,
                    ymin = 0.1, ymax = 0.4)

Applied to a list of data frames

# Mock data
set.seed(1)
data_list = list(data1, 
                 data.frame(status = data1$status, scores = c(40, 60, 15, 21, 97, 10)),
                 data.frame(status = data1$status, scores = c(45, 56, 11, 25, 95, 14)))

# Create a function 

your_function <- function(df) {
  df_summary <- t(broom::glance(summary(df$scores)))
  df %>%
  ggplot(., aes(x = scores, colour = status)) + 
  geom_density() + 
  geom_vline(data = . %>% 
               group_by(status) %>%
               summarise(scores.mean = mean(scores)), 
             aes(xintercept = scores.mean, colour = status), 
             linetype = "dashed", 
             size = 1) +
  annotation_custom(tableGrob(rbind(data.frame(df %>% filter(status == "in") %>% rename(var = status, val = scores)),
                                    data.frame(var = row.names(df_summary), val = df_summary, row.names = NULL)), rows = NULL, cols = NULL),
                    xmin = 60, xmax = 100,
                    ymin = 0.1, ymax = 0.4)

}

# Check if it works 
your_function(data_list[[2]])
your_function(data_list[[3]])

enter image description here enter image description here

# Map it
pdf("plot.pdf")
map(data_list, your_function)
dev.off()

You should now have a "plot.pdf" file with 3 pages with each plot.

Note that you should adapt the position of tableGrob according to your date, I don't know where to put the table, you can also compute the position according to summary values.

bVa
  • 3,839
  • 1
  • 13
  • 22
  • It complains with this error : `Error in attr(data, "tsp") <- c(start, end, frequency) : object is not a matrix` Do you know why? – bapors Apr 06 '18 at 12:32
  • The problem might be associated with the format of your data. Unfortunately, it is hard to help without more information about the error (code tested, line producing the error, data used, ...). – bVa Apr 09 '18 at 07:05