0

I have this large spreadsheet that I have saved as a .csv file. The spreadsheet has two "header" rows and is then arranged by columns as: file name, MZ, Area, MZ, Area... What I need to do is call the file, I figured out how to do this with both of the headers, then have R create several barplots. I need the bar plots to be for each of the "Area" columns, the ylim=lower and upper bounds of the data and have the title=the value in the MZ column right before the area. I have created a script to make the barplot for the first column but it is not automated and does not correctly name the plot. I have used both color and density to show the cyclical nature of the experimental set-up. Here is an abbreviated table.

structure(list(Data.File = c("20150420_04_01Ecoli_treat_0.00.d", 
"20150420_04_02Ecoli_treat_0.00.d", "20150420_04_03Ecoli_treat_0.00.d", 
"20150420_04_04Ecoli_treat_0.00.d", "20150420_04_05Ecoli_treat_0.00.d", 
"20150420_05_01Ecoli_treat_0.250.d"), MZ = c(540.3073, 540.3073, 
540.3073, 540.3073, 540.3073, 540.3073), Area = c(252984.6656, 
256032.4732, 249261.4615, 253533.2804, 250352.2293, 255704.8124
), MZ.1 = c(513.2872, 513.2872, 513.2872, 513.2872, 513.2872, 
513.2872), Area.1 = c(505815.005, 502831.1187, 501745.5544, 510544.8462, 
511942.0494, 504955.7114), MZ.2 = c(244.1325, 244.1325, 244.1325, 
244.1325, 244.1325, 244.1325), Area.2 = c(473471.315, 480002.1109, 
471329.1703, 477518.5349, 474360.5241, 476703.0057), MZ.3 = c(442.2254, 
442.2254, 442.2254, 442.2254, 442.2254, 442.2254), Area.3 = c(659916.9366, 
638415.4196, 636272.8178, 668030.9817, 651146.1962, 639103.8294
), MZ.4 = c(360.6892, 360.6892, 360.6892, 360.6892, 360.6892, 
360.6892), Area.4 = c(606414.6122, 595299.5358, 584649.0941, 
601272.5988, 585518.7376, 588818.7567), MZ.5 = c(226.0354, 226.0354, 
226.0354, 226.0354, 226.0354, 226.0354), Area.5 = c(38955.65059, 
39102.04637, 39282.88698, 40731.99391, 40280.5906, 38387.9069
), MZ.6 = c(170.0572, 170.0572, 170.0572, 170.0572, 170.0572, 
170.0572)), .Names = c("Data.File", "MZ", "Area", "MZ.1", "Area.1", 
"MZ.2", "Area.2", "MZ.3", "Area.3", "MZ.4", "Area.4", "MZ.5", 
"Area.5", "MZ.6"), row.names = c(NA, 6L), class = "data.frame")

Any suggestions you may be able to offer would be greatly appreciated.

Daniel
  • 1
  • 2
  • 1
    Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Molx Jul 09 '15 at 16:58
  • Is there a way to add a small .csv file to this format as an example? I am new to the format and relatively new to "R" and coding in general. I struggle but can generally read code given enough time. – Daniel Jul 09 '15 at 18:53
  • You can pust the result of `dput(head(dat))`. If there are too many columns, you can select just the first few since there's a pattern – Molx Jul 09 '15 at 18:55
  • I hope the is enough: structure(list(Data.File = c("20150420_04_01Ecoli_treat_0.00.d", "20150420_04_02Ecoli_treat_0.00.d", "20150420_04_03Ecoli_treat_0.00.d", "20150420_04_04Ecoli_treat_0.00.d", "20150420_04_05Ecoli_treat_0.00.d"), MZ = c(540.3073, 540.3073, 540.3073, 540.3073, 540.3073, 540.3073), Area = c(252984.6656, 256032.4732, 249261.4615, 253533.2804, 250352.2293, 255704.8124 ), MZ.1 = c(513.2872, 513.2872, 513.2872, 513.2872, 513.2872, 513.2872), Area.1 = c(505815.005, 502831.1187, 501745.5544, 510544.8462, 511942.0494), MZ.2 = c(244.1325, 244.1325, 244.1325, 244.1325, 244.1325) – Daniel Jul 09 '15 at 19:10
  • Your code got truncated, you should edit the question to add it. – Molx Jul 09 '15 at 23:44
  • I added the code to the question. – Daniel Jul 10 '15 at 17:34

1 Answers1

1

Something like this, using data.table

library(data.table)
nn<-length(scan(file=paste0("file.csv"),what="",sep=",",nlines=1,skip=2))
dt<-fread("file.csv",header=T,skip=1L,select=seq(2,nn,by=2))
mzs<-unlist(fread("file.csv",header=T,skip=1L,
                  select=seq(1,nn,by=2),nrows=1L))
lapply(1:length(mzs),function(x)barplot(unlist(dt[,x,with=F]),main=mzs[x]))
  1. Use scan to figure out programmatically how many columns there are. skip=2 is intended to skip to the line of headers.
  2. Only read the Area columns--if you're sure they're all the even-numbered columns. I'm skipping the MZ columns here because it would be inefficient to read in all of those repeated values.
  3. Only read the MZ columns, and only read the first value (because we know it's simply repeated)
  4. Plot; without reproducible data, I'm not sure if we have to set ylim or xlim manually.
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • When I try to use this I get the following error: Error in unlist(fread("A:/Data/6520/RT experiments/20150420_Ecoli0250_05_20150707_Processed_Table.csv", : could not find function "fread"; I replaced "file.csv" with my file name and location. The error here seems to be a missing function. Am I missing an R package? – Daniel Jul 09 '15 at 20:42
  • yes, sorry, I should have made that more clear; edited. – MichaelChirico Jul 09 '15 at 22:11
  • Thank you for your help to this point. I have been trying to correct an error that is originating from the "dt" line of code. The error is "Error in is.finite(to) : default method not implemented for type 'closure'". I assume this is an error due to non-number entries but when I remove them it does not solve the problem; in addition, for the working code I cannot just remove the sample names. Any ideas? – Daniel Jul 10 '15 at 17:17
  • are you replacing `ncol` with an actual number? I'm not sure but I suspect you're actually trying to pass the function `ncol` (see `?ncol`) instead of an integer number of columns, which is what I intended. See my edit. – MichaelChirico Jul 10 '15 at 17:26
  • I have the code reading the .csv file. When I try to create the graphs I get an error: "Error in barplot.default(dt[, x, with = F], main = mzs[x]) : 'height' must be a vector or a matrix" I am able to get some graphs out of the code if I add as.matrix after the barplot command but the plots are not usable. – Daniel Jul 10 '15 at 19:02
  • I see. `dt[,x,with=F]` is returning a `data.table`; `barplot` doesn't know what to do with it. The edited code should work. – MichaelChirico Jul 10 '15 at 19:19
  • The code works great on a small sample data set but when I try and analyze the entire (very large .csv) file all I get is a rectangle block that is rainbow colored (I colored it). I think this is a x/y limit issue. Is there a way to set these limits based on the data in each individual column? This is the code I used to generate the graphs: lapply(1:length(mzs),function(x)barplot(unlist(dt[,x,with=F]),col=rainbow(9),main=mzs[x])); Also can I display more than on plot per page? something like: [3:3] Vaguely remember this option... – Daniel Jul 10 '15 at 21:36
  • I think we've reached the limit of being able to help with the issue you asked about. Try `barplot(1:10)`. If any of your `Area` vectors are long (say `length(Area)=n`), it's going to try to make a `barplot` with `n` columns--this will be illegible if `n` is large (cf. `barplot(1:1000)`). Basically, this is becoming more of a question of "how do I visualize my data better" and less about "how do I make barplots in this way". – MichaelChirico Jul 10 '15 at 22:01