I'm a biologist, but I had to teach myself python and R working different places a few years ago. A situation came up at my current job that R would be really useful for, and so i cobbled together a program. Surprisingly, it does just what I'd like EXCEPT the graphs it's generating have an extra bar at the beginning. !
I've entered no data to correspond to that first bar:
I'm hoping this is some simple error in how I've set the plot parameters. Could it be because I'm using plot instead of boxplot? Is it plotting the headings? More worrisome is the possibility that while reading in and merging my 3 data frames I'm creating some sort of artifact data, which would also affect the statistical tests and make me very sad, though I don't see anything like this when I have it write the matrix to a file. I greatly appreciate any help!
Here's what it looks like, and then the function it calls (in another script). (I'm really not a programmer, so I apologize if the following code is miserable.) The goal is to compare our data (which is in columns 10-17 of a csv) to all of the data in a big sheet of clinical data in turn. Then, if there is a significant correlation (the p value is less than .05), to graph the two against each other. This gives me a fast way to find if there's something worth looking further into in this big data set.
first <- read.csv(labdata)
second <- read.csv(mrntoimacskey)
third <- read.csv(imacsdata)
firsthalf<-merge(first,second)
mp <-merge(firsthalf, third, by="PATIENTIDNUMBER")
setwd(aplaceforus)
pfile2<- sprintf("%spvalues", todayis)
setwd("fulldataset")
for (m in 10:17) {
n<-m-9
pretty= pretties[n]
for (i in 1:length(colnames(mp))) {
tryCatch(sigsearchA(pfile2,mp, m, i, crayon=pretty), error= function(e)
{cat("ERROR :", conditionMessage(e), "\n")})
tryCatch(sigsearchC(pfile2,mp, m, i, crayon=pretty), error= function(e)
{cat("ERROR :", conditionMessage(e), "\n")})
}
}
sigsearchA<-function(n, mp, y, x, crayon="deepskyblue"){
#anova, plots if significant. takes name of file, name of database,
#and the count of the columns to use for x and y
stat<-oneway.test(mp[[y]]~mp[[x]])
pval<-stat[3]
heads<-colnames(mp)
a<-heads[y]
b<-heads[x]
ps<-c(a, b, pval)
write.table(ps, file=n, append= TRUE, sep =",", col.names=FALSE)
feedback<- paste(c("Added", b, "to", n), collapse=" ")
if (pval <= 0.05 & pval>0) {
#horizontal lables
callit<-paste(c(a,b,".pdf"), collapse="")
val<-sprintf("p=%.5f", pval)
pdf(callit)
plot(mp[[x]], mp[[y]], ylab=a, main=b, col=crayon)
mtext(val, adj=1)
dev.off()
#with vertical lables, in case of many groups
callit<-paste(c(a,b,"V.pdf"), collapse="")
pdf(callit)
plot(mp[[x]], mp[[y]], ylab=a, main=b,las=2,cex.axis=0.7, col=crayon)
mtext(val, adj=1)
dev.off()
}
print(feedback) }
graphics.off()