1

I have many (more than 100) csv files with same table structure for example in all table headers are in row 4 and they all have 6 columns and the data are from row 5 to 400001,

I need to plot these data in a scatter plot which x shows the first column (40001 time unit) and the other columns are Ys for different variables, [its preferable if I be able to format a plot (colors, ranges, titles, legends , ...)] and automatically input these csv files and export png or pdf or anything else that might be useful , I have both Excel and R but I don't know how to do this plotting in an efficient manner. (Naming is also important, they shall have the name of their CSV files)

Any idea on how can I do this with less effort ?

Thanks

Marzy
  • 1,884
  • 16
  • 24
  • 2
    Write a function to read in your data and do the plot. Then `lapply` it to all of the files. You're not going to get specific answers without specifying what kind of plot you want and showing some sample data as part of [a reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Thomas Nov 08 '13 at 06:31
  • Thanks Thomas, I will look into the link you have provided. – Marzy Nov 08 '13 at 06:36

1 Answers1

3

Your question is a bit light on specific detail, so I'm going to make some assumptions to get started on a kind of skeleton of an answer.

Let's make some fake CSV files ones for example data

Set working directory to folder containing data...

setwd("C:/my-csv-files")

Make 100 data frames of six col by 500 rows (to keep things quick)...

df <- lapply(1:100, function(i) data.frame(cbind(1:500, matrix(sample(1000), 500, 5))))

Make 100 csv files from these data frames in the working directory...

lapply(1:length(df), function(i) write.csv(df[[i]],file=paste("df",i,"csv",sep=".")))

Now we can reproduce your problem and quickly read many CSV files into R like so...

# create a list of all CSV files in all the folders 
files <- (dir("C:/my-csv-files", recursive=TRUE, full.names=TRUE, pattern="\\.(csv|CSV)$"))
# read in the CSV files and add the filename of each file as a column to
# each dataset so we can trace back dodgy data 
# so, create a function to read the CSV and get filenames
read.tables <- function(file.names, ...) {
  require(plyr)
  ldply(file.names, function(fn) data.frame(Filename=fn, read.csv(fn, ...)),.progress = 'text')
}
# execute function to read in data from each CSV, including file names of file that data comes from
mydata <- read.tables(files, stringsAsFactors = FALSE)

Now plot data, you say you just want one plot of all the data in the CSV files...

Melt into a format for plotting, here X1 is your time variable and X2 to X5 are the other variables in your CSV files

require(reshape2)
dat <- melt(mydata, id.vars = c("X1"), measure.vars = c("X2", "X3", "X4", "X5"))

And here's a single scatter plot of your time variable by the other variables (colour-coded). It's just not clear from your question exactly what you want to plot, so do ask another question with more details.

require(ggplot2)
ggplot(dat, aes(X1, value)) +
  geom_point(aes(colour = factor(variable)))

Now, save it as a PDF or PNG, see ?ggsave for the numerous options here...

ggsave(file="myplot.pdf")
ggsave(file="myplot.png")

Find the location of those files

getwd()

To make one plot per CSV file here's one method

listcsvs <- lapply(files,function(i) read.csv(i,  stringsAsFactors = FALSE))
names(listcsvs) <- files
require(reshape2)
require(ggplot2)
for (i in 1:length(files)) { 
  tmp <- melt(listcsvs[[i]], id.vars = "X1", measure.vars = c("X2", "X3", "X4", "X5"))
  print(ggplot(tmp,aes(X1, value)) + 
          geom_point(aes(colour = factor(variable))) +
          ggtitle(names(listcsvs[i]))
        )
}

If you are using RStudio you can scroll through the plots and Export the ones you want to save them as a PDF or PNG.

So that's covered the main parts of your question:

  1. Read in a large amount of CSV files into R
  2. Plot data as a one scatter plot displaying several variables against one variable
  3. Plot data as one scatter plot per CSV file
  4. Save the plots as a PDF or PNG file

And as a bonus you've got code for creating example data which you can use in your future questions. In general, the better the quality of your example data, the better quality answers you'll get (as Thomas suggests in his comment).

Ben
  • 41,615
  • 18
  • 132
  • 227
  • Thanks so much , After reading 100 files into R , How can I create a scatter plot template and then plot each file (or export their plot into png files?) – Marzy Nov 08 '13 at 07:25
  • 2
    We'll need to see the specific structure of your data to give a good answer to that Q. Please share one of your CSV files and an example image of the kind of plot you want to produce (ie. from someone else's publication). Once we can make one plot, it's easy to make 100's. But first we need to know exactly what you're trying to do! Please **[ask a new question](http://stackoverflow.com/questions/ask)** that includes one of your CSV files and an example of the plot you want to produce. – Ben Nov 08 '13 at 07:29
  • 1
    Updated to show how to make one plot per CSV file. If this now answers your question, then you should [mark it as accepted](http://meta.stackexchange.com/a/5235) – Ben Nov 08 '13 at 08:26
  • Thank you very much for your answer, I will try it out, one important detail that is missing in this question is filtering top rows and a few columns, I have asked it here : http://stackoverflow.com/questions/19855261/r-create-a-scatter-plot-from-a-number-of-csv-files-automatically-after-filteri – Marzy Nov 08 '13 at 09:16