I am trying to import some .csv files into R for a company. I am trying to basically compare SPC over a selected date domain. When I enter the data into R I have set all other columns to NULL and then I try to plot .csv file. When I do this, my dates are out of order, I have used sapply(mydata2, class)
and find that Date is a factor and SPC is an integer. I am sure this is part of the problem and has been consistently the issue. I have slightly remedied the situation by changing the Excel file (which is a CSV file) date column to a Julian date but for sake of presentation I would much rather have it in short date format. Also it would be great to know how to do this in R as opposed to having to switch to Excel. This has also been my issue for kmeans clustering as well.
Any ideas?
I should also mention that I am basically trying to create a simple function that removes all the nonsense from the Excel file and basically computes the necessary components in various ways. I have roughly 60 more Excel files, split by month to perform this analysis on.
mydata2 = read.csv("Copy of Monthly Raw SPC Aug 2015.csv")
mydata2
mydata2$Trailer <- NULL
mydata2$ProducerID <- NULL
mydata2$SampleID <- NULL
mydata2$Producer.Number <- NULL
mydata2$BTUNo <- NULL
mydata2$Route <- NULL
mydata2
plot(mydata2)
sapply(mydata2, class)
It is just a simple code for plots, I have tried other things like ordering or boxplots. A sample of the actual data I want to plot is,
...
96 42233 27000
97 42233 29000
98 42233 2000
99 42233 38000
100 42234 11000
101 42234 157000
...
Instead of the general number, first column would be in a short date formate like 96 would be 8/16/2015. So what happens is when I then go to plot, my box and whisker plot has multiple entries on the same day but the days are all over the graph, I need the same result but the days ordered.