1

I have the following data:

Project Topic    C10    C14     C03     C11     C16     C08
P1      T1      0.24    0.00    0.00    0.04    0.04    0.00
P1      T2      0.00    0.30    0.00    0.00    0.00    0.00
P1      T3      0.04    0.04    0.00    0.24    0.00    0.00
P1      T4      0.00    0.00    0.00    0.04    0.33    0.04
P1      T5      0.00    0.09    0.21    0.00    0.00    0.00
P1      T6      0.00    0.09    0.00    0.00    0.00    0.34

P2      T1      0.20    0.00    0.00    0.04    0.00    0.04
P2      T2      0.00    0.22    0.04    0.00    0.00    0.00
P2      T3      0.04    0.00    0.00    0.24    0.00    0.00
P2      T4      0.00    0.00    0.04    0.00    0.33    0.00
P2      T5      0.04    0.00    0.21    0.00    0.00    0.00
P2      T6      0.00    0.04    0.00    0.00    0.00    0.34

P3      T1      0.20    0.00    0.00    0.00    0.08    0.00
P3      T2      0.00    0.17    0.00    0.00    0.00    0.00
P3      T3      0.00    0.00    0.00    0.08    0.00    0.00
P3      T4      0.00    0.04    0.00    0.04    0.24    0.00
P3      T5      0.00    0.00    0.21    0.00    0.00    0.04
P3      T6      0.00    0.09    0.00    0.00    0.00    0.22
    ......

What I want to do is to create the above data into the following plot:

enter image description here

In this sketch the height of the bar belongs to C#s' values and it should have six colors. Every barplot belongs to P#s data-set.

I tried with the following code by copying every P#s data-set into .csv file and plot it in the same plot frame using par(mfrow=c(5,3)):

library(e1071)
topics <- read.csv("P1.csv", head=TRUE)
dput(head(topics))
pdf("cosinesimilarityplots.pdf", family="Times")
par(mfrow=c(5,3))
colours <- c("red", "orange", "yellow", "green","blue"," black")
barplot(as.matrix(topics), main="Project Name", ylab="", cex.lab = 1.5, cex.main = 1.4, beside=TRUE, col=colours,ylim=c(0, 0.5))
title(ylab=expression(paste("Cose(", theta, ")")),xlab="Seeded-LDA topics", line=2, cex.lab=1.2)
legend("topleft", c("C10: Resource Management (RM)","C14:Cross Site Scripting (XSS)","C03:Authentication Abuse (AA)","C11:Buffer Overflow (BoF)","C16:Access  Privileges (AP)","C08:SQL  Injection (SI)"), cex=0.85, bty="n", fill=colours)
dev.off()

The results of dput(head(topics))is following:

structure(list(T1 = c(0.24, 0, 0, 0.04, 0.04, 0), T2 = c(0.24, 
0.3, 0, 0, 0, 0), T3 = c(0.04, 0.04, 0, 0.24, 0, 0), T4 = c(0, 
0, 0, 0.04, 0.33, 0.04), T5 = c(0, 0.09, 0.21, 0, 0, 0), T6 = c(0, 
0.09, 0, 0, 0, 0.34)), .Names = c("T1", "T2", "T3", "T4", "T5", 
"T6"), row.names = c(NA, 6L), class = "data.frame")

Then, I realized the barplots quality become very low, and plotting every P#s data in a separate .csv file will took forever specially if the number of P#s is bigger than 15.

What's the way to plot the main dataset file efficiently without splitting it into smaller files? Preferably using R

Sultan
  • 189
  • 2
  • 9
  • please can you use `dput` to make some data available. Easiest solution will be to use ggplot but will need data processing first – Richard Telford Apr 24 '16 at 14:38
  • Your best bet is to process the data into one long dataframe, and then use ggplot2. – Heroka Apr 24 '16 at 14:48
  • Would be possible to give me sample code? I don't have strong background in R – Sultan Apr 24 '16 at 14:50
  • There's plenty of questions like this on SO, you can search for them. Or do a ggplot/R tutorial. – Heroka Apr 24 '16 at 15:16
  • lattice library can also do this nicely. Once again, search google or SO for previous advice on how to do this, rather than asking for someone to write the code here. – dww Apr 24 '16 at 17:05

1 Answers1

0

You can create a plot like what you sketched using ggplot2 and gridextra with a little help from dplyr and reshape2. As is often the case, just because you have the power to do something in R doesn't mean that it's intuitive. Basically you have to create a separate plot object for each Project, strip out the legend, and then reassemble everything using grid.arrange().

library(tidyverse) # ggplot2, dplyr, etc
library(reshape2)  # Outdated but still works
library(gridExtra) # Allows us to put plots into grids

# Generate some dummy data
data <- tibble(
  Project =   rep(paste0("P", 1:6), length = 30),
  C10 = abs(rnorm(30)),
  C14 = runif(30),
  C03 = sample(1:30) / 50,
  C11 = rnorm(30) ^ 2,
  C16 = abs(rnorm(30) / 2),
  C08 = abs(rnorm(30) * 2)
)

data <- data %>%
  arrange(Project) %>%
  mutate(Topic = rep(paste0("T", 1:5), length = 30))

# Melt the data from wide to long format
data <- melt(data, id.vars = c("Project", "Topic"))

#########################################################
# Now you can actually create the chart
#########################################################

# Use a function to create a version of the plot for each Project
plot_proj <- function(projnum) {
  filter(data, Project == projnum) %>%
    rename(Legend = variable) %>%
    ggplot(., aes(x = Topic, y = value, fill = Legend)) +
    geom_bar(stat = "identity", position = "dodge") +
    labs(x = "", y = "", title = projnum) +
    theme_bw() +
    theme(plot.title = element_text(hjust = 0.5),
          panel.border = element_blank())
}

# Create a separate plot for each Project
plots <- map(unique(data$Project), plot_proj)

# This function was borrowed from an older StackOverflow answer
# Source: http://stackoverflow.com/questions/13649473/add-a-common-legend-for-combined-ggplots
g_legend <- function(a.gplot) {
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x)
    x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
}

mylegend <- g_legend(plots[[1]])

# Combine the plots and add one
grid.arrange(
  arrangeGrob(
    plots[[1]] + theme(legend.position = "none"),
    plots[[2]] + theme(legend.position = "none"),
    plots[[3]] + theme(legend.position = "none"),
    plots[[4]] + theme(legend.position = "none"),
    plots[[5]] + theme(legend.position = "none"),
    plots[[6]] + theme(legend.position = "none"),
    left = mylegend
  )
)
Andrew Brēza
  • 7,705
  • 3
  • 34
  • 40