0

I have a series of csv files called according to a specific format:

4 prefices "matrix_del_cats_", "matrix_add_cats_", "matrix_del_groups_", "matrix_add_groups_", followed by a count of replicates 0 to 9, followed by "_" and 6 vectors "[1, 0, 0, 0, 0, 0]" to "[0, 0, 0, 0, 0, 1]"

It looks likes this (non-exhaustive) list:

matrix_add_cats_0_[1, 0, 0, 0, 0, 0].csv
matrix_add_cats_1_[1, 0, 0, 0, 0, 0].csv
matrix_add_cats_2_[1, 0, 0, 0, 0, 0].csv
...
matrix_add_cats_9_[1, 0, 0, 0, 0, 0].csv
matrix_add_cats_0_[0, 1, 0, 0, 0, 0].csv
matrix_add_cats_1_[0, 1, 0, 0, 0, 0].csv
matrix_add_cats_3_[0, 1, 0, 0, 0, 0].csv
...
matrix_add_cats_9_[0, 1, 0, 0, 0, 0].csv
...
matrix_add_cats_0_[0, 0, 1, 0, 0, 0].csv
...
matrix_add_cats_0_[0, 0, 0, 1, 0, 0].csv
...
matrix_add_cats_0_[0, 0, 0, 0, 1, 0].csv
...
matrix_add_cats_0_[0, 0, 0, 0, 0, 1].csv
...

Each csv file looks like this:

Name,A,B,C,D,E,F,G,H,I,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,AA A,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0 B,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,0,0 C,0,0,0,0,0,0,0,7,5,7,0,0,0,5,0,0,7,5,0,5,7,5,0,0,0,0,0 D,5,0,0,5,5,7,0,0,0,4,0,0,0,0,0,0,0,0,0,5,5,5,0,0,0,0,0 E,0,0,0,5,0,0,0,5,0,5,7,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0 F,0,0,0,7,0,0,0,0,0,7,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0 G,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 H,0,0,7,0,5,0,0,0,0,5,5,0,7,5,0,0,7,0,0,0,5,0,5,0,0,0,5 I,0,0,5,0,0,0,0,0,0,0,0,0,0,5,7,0,0,0,0,5,0,0,0,0,0,5,5 J,0,0,7,4,5,7,0,5,0,0,0,0,0,0,5,0,4,7,0,7,7,0,5,0,0,5,0 K,0,0,0,0,7,0,0,5,0,0,0,0,0,0,7,0,0,0,0,0,0,0,0,0,0,0,5 L,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 M,0,0,0,0,0,0,0,7,0,0,0,0,0,0,5,0,0,0,0,0,0,0,0,0,0,5,5 N,0,0,5,0,0,0,0,5,5,0,0,0,0,0,5,0,0,5,0,0,0,0,5,0,5,0,7 O,0,5,0,0,0,0,0,0,7,5,7,0,5,5,5,0,5,0,0,5,0,0,5,0,0,0,7 P,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 Q,0,0,7,0,0,5,0,7,0,4,0,0,0,0,5,0,0,5,0,7,5,0,0,0,0,0,0 R,0,0,5,0,0,0,0,0,0,7,0,0,0,5,0,0,5,0,0,0,5,0,0,5,0,0,5 S,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 T,0,0,5,5,0,0,0,0,5,7,0,0,0,0,5,0,7,0,0,0,5,0,0,0,5,5,5 U,0,0,7,5,5,0,0,5,0,7,0,0,0,0,0,0,5,5,0,5,5,7,0,0,5,7,5 V,0,0,5,5,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7,0,0,0,0,0,0 W,5,0,0,0,0,0,0,5,0,5,0,0,0,5,5,0,0,0,0,0,0,0,0,0,0,0,0 X,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,0,5,7,0 Y,0,0,0,0,0,0,0,0,0,0,0,0,0,5,0,0,0,0,0,5,5,0,0,5,0,5,0 Z,0,0,0,0,0,0,0,0,5,5,0,0,5,0,0,0,0,0,0,5,7,0,0,7,5,0,0 AA,0,0,0,0,0,0,0,5,5,0,5,0,5,7,7,0,0,5,0,5,5,0,0,0,0,0,0

I need to automatically:

  1. Read each of these csv files
  2. to produce a figure with a specific title
  3. and save it to a png file of the save name than the csv file

The titles should look like this:

"Interactions of" + {"groups" OR "categories". The former if "groups" appears in the csv file, the latter if "cats" appears in the file name} + " according to " + {A, B, C, D, E OR F. With "A" if [1, 0, 0, 0, 0, 0], "B" if [0, 1, 0, 0, 0, 0], etc.}

Here's my code for an individual figure:

library(ggplot2)
install.packages("extrafont");library(extrafont)
font_import(pattern = 'Akk') 
library(reshape2)

t1 <- read.csv("/matrix_add_cats_0_[1, 0, 0, 0, 0, 0].csv", check.names = FALSE, sep = ",")
t2 <- read.csv("/matrix_del_cats_0_[1, 0, 0, 0, 0, 0].csv", check.names = FALSE, sep = ",")

tableau <- cbind(t1[,1, drop=FALSE], t1[,-1] - t2[,-1])
mylevels <- tableau$Name
tableau.m <- melt(tableau)

#reorder factors
tableau.m$Name <- factor(tableau.m$Name,levels=mylevels)
tableau.m$variable <- factor(tableau.m$variable, levels=mylevels)

p <- ggplot(tableau.m, aes(variable,Name)) + 
      geom_tile(aes(fill = value), colour = "white") + 
      scale_fill_distiller(palette = "YlGnBu",limits=c(min(tableau.m$value), max(tableau.m$value))) +
      geom_text(aes(label=value), family="AkkuratLightPro-Regular", color = "black",lineheight=.5,size = 4)

base_size <- 9
p + theme_grey(base_size = base_size) + 
  labs(x = "", y = "") + scale_x_discrete(expand = c(0, 0)) + 
  scale_y_discrete(expand = c(0, 0)) + 
  theme(legend.position = "none", axis.ticks = element_blank(),
  axis.text.x = element_text(size = 12, angle = 270, hjust = 0, colour = "grey50", family="AkkuratPro-Regular")
  ,axis.text.y = element_text(size = 12, angle = 0, hjust = 1, colour = "grey50", family="AkkuratPro-Regular")) +
  ggtitle("***") + 
     theme(plot.title = element_text(size = 16, angle = 0, colour = "grey25", family="AkkuratPro-Regular"))

ggsave(file="***.png")

Although this is pretty complicated, I'm pretty sure this is something that can be done in R. Any clue on how to proceed?

Lucien S.
  • 5,123
  • 10
  • 52
  • 88
  • 1
    I'd start with reading your CSVs into a [list of data frames](http://stackoverflow.com/a/24376207/903061) - or maybe two lists, one for `add` one for `del`. Name your lists with the relevant parts of the file names, code up your logic for titling, ... I'm not really sure where/why you're stuck. – Gregor Thomas Oct 26 '15 at 22:55
  • Thanks @Gregor. I'm stuck at several stages. First, I'm guessing I need to read the csv files (and make the list of dataframes) with regex (have no clue how to do this)? Then, how to loop through that list? thirdly, how to extract info from the csv file name to build the title? – Lucien S. Oct 26 '15 at 23:07

1 Answers1

2

Try some approach like this: Have a look at ?lapply, ?mapply

Get Files

matrix_add_cats_files <- list.files("YOUR PATH", full.names = TRUE, pattern = "matrix_add_cats*")
matrix_del_cats_files <- list.files("YOUR PATH", full.names = TRUE, pattern = "matrix_del_cats*")

Read them as a list

dat_add_cats <- lapply(matrix_add_cats_files, read.csv, check.names = FALSE, sep = ",")
dat_del_cats <- lapply(matrix_del_cats_files, read.csv, check.names = FALSE, sep = ",")

Do the Data-Transformation

dat <- mapply(function(t1, t2){
  tableau <- cbind(t1[,1, drop=FALSE], t1[,-1] - t2[,-1])
  mylevels <- tableau$Name
  tableau.m <- melt(tableau)
  tableau.m$Name <- factor(tableau.m$Name,levels=mylevels)
  tableau.m$variable <- factor(tableau.m$variable, levels=mylevels)
  tableau.m
}, dat_add_cats, dat_del_cats)

So you have a list with all your tableau data.

Plot it

This example takes matrix_add_cats_files as plot title and name. Put in there a list/vector of length==length(dat) with your wished for name.

mapply(function(tableau.m, filename){
  p <- ggplot(tableau.m, aes(variable,Name)) + 
    geom_tile(aes(fill = value), colour = "white") + 
    scale_fill_distiller(palette = "YlGnBu",limits=c(min(tableau.m$value), max(tableau.m$value))) +
    geom_text(aes(label=value), family="AkkuratLightPro-Regular", color = "black",lineheight=.5,size = 4)
  
  base_size <- 9
  p + theme_grey(base_size = base_size) + 
    labs(x = "", y = "") + scale_x_discrete(expand = c(0, 0)) + 
    scale_y_discrete(expand = c(0, 0)) + 
    theme(legend.position = "none", axis.ticks = element_blank(),
          axis.text.x = element_text(size = 12, angle = 270, hjust = 0, colour = "grey50", family="AkkuratPro-Regular")
          ,axis.text.y = element_text(size = 12, angle = 0, hjust = 1, colour = "grey50", family="AkkuratPro-Regular")) +
    ggtitle(filename) + 
    theme(plot.title = element_text(size = 16, angle = 0, colour = "grey25", family="AkkuratPro-Regular"))
  
  ggsave(file=paste0(filename,".png"))
}, dat, matrix_add_cats_files)
Community
  • 1
  • 1
Rentrop
  • 20,979
  • 10
  • 72
  • 100
  • Thanks so much @Floo0! It's getting me quite a lot closer to the solution. R raises an error though: "Error : ggplot2 doesn't know how to deal with data of class factor". Is it possible that the data structure changed using your method? I'll try an upload some data... cheers! – Lucien S. Oct 26 '15 at 23:26
  • Indeed, your code doesn't produce the same tableau.m data as my rough code did. Mine looked like this : https://dl.dropboxusercontent.com/u/73950/Capture%20d%E2%80%99%C3%A9cran%202015-10-26%20%C3%A0%2017.14.19.png and yours like that: https://dl.dropboxusercontent.com/u/73950/Capture%20d%E2%80%99%C3%A9cran%202015-10-26%20%C3%A0%2017.15.49.png – Lucien S. Oct 27 '15 at 00:16
  • 1
    Then have a look at the Data-Transformation step. I dont know what exactly went wrong. Save the function as an extra one, do debug(your_function) and have a look whats happening. I guess this happenda because melt without sprecifications guesses the value.col and so on... – Rentrop Oct 27 '15 at 06:53