2

I am currently trying to use ggplot2 to visualize results from simple current-voltage experiments. I managed to achieve good results for one set of data of course.

However, I have a number of current-voltage datasets, which I input in R recursively to get the following organisation (see minimal code) :

data.frame(cbind(batch(string list), sample(string list), dataset(data.frame list)))

Edit : My data are stored in text files names batchname_samplenumber.txt, with voltage and current columns. The code I use to import them is :

require(plyr)
require(ggplot2)

#VARIABLES
regex <- "([[:alnum:]_]+).([[:alpha:]]+)"
regex2 <- "G5_([[:alnum:]]+)_([[:alnum:]]+).([[:alpha:]]+)"

#FUNCTIONS
getJ <- function(list, k) llply(list, function(i) llply(i, function(i, indix) getElement(i,indix), indix = k))

#FILES
files <- list.files("Data/",full.names= T)

#NAMES FOR FILES
paths <- llply(llply(files, basename),function(i) regmatches(i,regexec(regex,i)))
paths2 <- llply(llply(files, basename),function(i) regmatches(i,regexec(regex2,i)))
names <- llply(llply(getJ(paths, 2)),unlist)
batches <- llply(llply(getJ(paths2, 2)),unlist)
samples <- llply(llply(getJ(paths2, 3)),unlist)

#SETS OF DATA, NAMED
sets <- llply(files,function(i) read.table(i,skip = 0, header = F))
names(sets) <- names
for (i in as.list(names)) names(sets[[i]]) <- c("voltage","current")

df<-data.frame(cbind(batches,samples,sets))    

And a minimal data can be generated via :

require(plyr)

batch <- list("A","A","B","B")
sample <- list(1,2,1,2)
set <- list(data.frame(voltage = runif(10), current = runif(10)),data.frame(voltage = runif(10), current = runif(10)),data.frame(voltage = runif(10), current = runif(10)),data.frame(voltage = runif(10), current = runif(10)))

df<-data.frame(cbind(batch,sample,set))

My question is : is it possible to use the data as is to plot using a code similar to the following (which does not work) ?

ggplot(data, aes(x = dataset$current, y = dataset$voltage, colour = sample)) + facet_wrap(~batch)

The more general version would be : is ggplot2 able of handeling raw physical data, as opposed to discrete statistical data (like diamonds, cars) ?

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Thibaud Ruelle
  • 303
  • 1
  • 16
  • 1
    Your description of "inputing data in R recursively" is very unclear. Additionally, it's not clear what the distinction could possibly be between "statistical data" and "raw physical data". Data is data. It's up to you to organize properly. You should perhaps provide a concrete, reproducible example, following the guidelines [here](http://stackoverflow.com/q/5963269/324364). – joran Mar 25 '12 at 21:46
  • You probably want to get your list into a data.frame format. You may be able to do this with data.frame(batch=your.list[1], sample=your.list[2], your.list[3]). A few trails: do(your.list, rbind), reshape, ... – Etienne Low-Décarie Mar 25 '12 at 22:25
  • Thanks for your comments, I am sure it is just a matter of wrapping my head around melt and ggplot2. However I have looked at each ggplot2 diamonds and cars examples and not found anything using linked variables (like a value of current goes with a value of voltage). Hence my question. I added more details, following joran comment. Thanks again. – Thibaud Ruelle Mar 25 '12 at 23:22
  • Looks like you've mostly solved your problem. As an aside, you shouldn't be using `$` inside `aes()`. You only need to pass the variable name, not the vector itself: `aes(x = current, y = voltage,...)`. It will be less prone to errors. – joran Mar 26 '12 at 01:45
  • I think the topic can be closed. Thank you for your help. – Thibaud Ruelle Mar 26 '12 at 17:18

2 Answers2

1

It's not clear how the sample names are defined with respect to the dataset. The general idea for ggplot2 is that you should group all your data in the form of a melted (long format) data.frame.

library(ggplot2)
library(plyr)
library(reshape2)

l1 <- list(batch="b1", sample=paste("s", 1:4, sep=""),
           dataset=data.frame(current=rnorm(10*4), voltage=rnorm(10*4)))
l2 <- list(batch="b2", sample=paste("s", 1:4, sep=""),
           dataset=data.frame(current=rnorm(10*4), voltage=rnorm(10*4)))
l3 <- list(batch="b3", sample=paste("s", 1:4, sep=""),
           dataset=data.frame(current=rnorm(10*4), voltage=rnorm(10*4)))

list_to_df <- function(l, n=10){

  m <- l[["dataset"]]
  m$batch <- l[["batch"]]
  m$sample <- rep(l[["sample"]], each=n)
  m
}

## list_to_df(l1)

m <- ldply(list(l1, l2, l3), list_to_df)

ggplot(m) + facet_wrap(~batch)+
  geom_path(aes(current, voltage, colour=sample))
baptiste
  • 75,767
  • 19
  • 198
  • 294
  • Thank you very much for your answer. You managed to understand quite exactly what I pictured, even though I was not that clear in my question (which is now edited). I am currently trying to adapt your code to my case. – Thibaud Ruelle Mar 25 '12 at 23:25
1

With the newly-defined problem (two-column files named "batchname_samplenumber.txt"), I would suggest the following strategy:

read_custom <- function(f, ...) {
 d <- read.table(f, ...)
 names(d) <- c("V", "I")
 ## extract sample and batch from the base filename
 ids <- strsplit(gsub(".txt", "", f), "_")
 d$batch <- ids[[1]][1]
 d$sample <- ids[[1]][2]
 d
}

## list files to read
files <- list.files(pattern=".txt")
## read them all in a single data.frame
m <- ldply(files, read_custom)
baptiste
  • 75,767
  • 19
  • 198
  • 294