community!
I guess it's time for my first question on this site. Hopefully i can describe it clearly:
Background: Currently, I am trying to write a script for cleaning, subsetting and plotting of data from an .csv file. One "speciality" about the raw data is that there is 4 different measurements at each time point and the data is sorted like "O minutes...30 minutes.....49 minutes and 30 seconds.....0 minutes....30 minutes...." until the whole timespan is completed 4x.
install.packages("ggplot2")
library(ggplot2)
results<-read.csv(file="Rnase alert _Ecoli target_20170906_afternoon1.csv", header = FALSE, stringsAsFactors = FALSE, sep = ";")
rownames(results)<-results[,3] #set column with samplenames as rownames and remove unneccessary columns
results<-results[,-(1:3)]
colnames(results)<-results[2,] #same for colnames, now rownames = samples and colnames= timepoints
results<-results[-(1:2),]
starts<-as.vector(col(results)[which(results == "0 min ")]) #check the start timepoints (there is multiple datasets in the one file)
starts #there are 4 starts-->4 different sets in the file
results1<-subset(results,select = (starts[1]:(starts[2]-1) )) #create subsets each going from "0 min" to latest timepoint
results2<-subset(results,select = (starts[2]:(starts[3]-1) ))
results3<-subset(results,select = (starts[3]:(starts[4]-1) ))
results4<-subset(results,select = (starts[4]:ncol(results)))
results1<-data.frame(t(results1)) #change rows and columns for easier plotting
results2<-data.frame(t(results2))
results3<-data.frame(t(results3))
results4<-data.frame(t(results4))
View(results4) #everything looks perfect
timeinseconds<-seq(0,2970,30) #create a vector with timeponts from 0 sec to 49min30sec in seconds because timepoints in original dataset contain text
From here on, things go downhill:
par(mfrow=c(1,1))
plot(timeinseconds,results1$Sample.X1,type = "l") #result: instead of 80,000, the max. Y-value is 100, while the timepoints are correctly scaled; overay of graphs is intended
lines(timeinseconds,results1$Sample.X2,type = "l")
lines(timeinseconds,results1$Sample.X3,type = "l")
lines(timeinseconds,results1$Sample.X4,type = "l")
lines(timeinseconds,results1$Sample.X5,type = "l")
lines(timeinseconds,results1$Sample.X6,type = "l")
lines(timeinseconds,results1$Sample.X7,type = "l")
lines(timeinseconds,results1$Sample.X8,type = "l")
lines(timeinseconds,results1$Sample.X9,type = "l")
lines(timeinseconds,results1$Sample.X10,type = "l")
while the values range from 30,000 to 80,0000, the Y-axis shows a span from 0 to 100. The x-axis is in the right dimensions accoring to "timeinseconds". And there is no error message.
Previous attempts to sove this:
results1$Sample.X1<-as.numeric(results1$Sample.X1) #manually setting the first column from factor to numeric: no change
Also, merging the table with the "timeinseconds"-vector with colbind doesn't change a thing. When checking the subset with "results1" or "View(results1)" every value is correct.
After searching for a solution for the last 6h I am pretty clueless what went wrong but i have a feeling that it is one minor stuoid mistake you guys can see immediately^^.
Thanks in advance!