2

I'm a student from Germany. I want to create a summary (0.25 & 0.75 quantile, mean, min, max) and different plots for special columns (e.g. Inflow or Low).
The issue is that there is not only one .csv file, there are about 3200 files in that folder - different names (ISIN numbers of portfolios all starting with DE000LS9xxx). After I looked through different platforms and this forum I tried different possibilities. My last try was to name every file 001.csv, 002.csv, etc. and use an answer out of this forum:

 directory <- setwd("~/Desktop/Uni/paper/testdata/")
 Inflowmean <- function(directory, Inflow, id = 1:3) {
 filenames <- sprintf("%03d.csv", id)
 filenames <- paste(directory, filenames, sep=";", dec=",")
 ldf <- lapply(filenames, read.csv)
 df=ldply(ldf)
 summary(df[, Inflow], na.rm = TRUE)
 }  

I really hope that you can help me, cause I'm new and just started to learn commands in RStudio - seems that I'm not able to handle it, also tried different tutorials and the help function in the program... Thank you so much!

juke
  • 21
  • 2
  • did you actually try to use the code you posted? it could be a solution, although I suspect that there are probably several other ways to do the same stuff... If you tried, where did you get stuck? – MaZe Jul 24 '15 at 13:18
  • So, what is your question? Does the code above run? If not, what is the error? What specifically do you want help with? – mathematical.coffee Jul 24 '15 at 13:24
  • the code doesn't run and there is no error notification showing up. It just shows the code in the console, but there is no summary for the testdata. The help I need is how to get that code or any other code doing the thing that I need. I'm sorry to say it, but I'm a noob with R and not best in programming or understanding it – juke Jul 24 '15 at 15:11

2 Answers2

0

from Using R to list all files with a specified extension and Opening all files in a folder, and applying a function

filenames <- list.files("~/Desktop/Uni/paper/testdata", pattern="*.csv", full.names=TRUE)
ldf <- lapply(filenames, read.csv)
res <- lapply(ldf, summary)
Community
  • 1
  • 1
Ajay Ohri
  • 3,382
  • 3
  • 30
  • 60
  • if I try to get that code running, `it says Error in lapply(ldf, summary) : object 'ldf' not found` but in the `environment` windows it detects the files in the folder when I use `path="~/Desktop/Uni/paper/testdata"` instead of `temp` – juke Jul 24 '15 at 15:28
0

It is rather unclear what your question actually is, but there are a number of problems with your code:

  • directory <- setwd("~/Desktop/Uni/paper/testdata/"): See ?setwd - it returns the current directory before changing the working directory, not ~/Desktop/Uni/paper/testdata/. You probably want

    directory <- "~/Desktop/Uni/paper/testdata/"
    setwd(directory)
    
  • filenames <- paste(directory, filenames, sep=";", dec=",") -- this will create filenames like "~/Desktop/Uni/paper/testdata/;001.csv;,". You probably want the separator to be / or .Platform$file.sep. I don't know why you have dec="," but that will just paste it onto the end. Try pasteing a few things together to see what gives you file names that make sense for your data.

  • Your ldply syntax is wrong: you probably want

    ldply(ldf, function (x) summary(x[, Inflow], na.rm=T))
    

See ?ldply for more information. Also, to use ldply, you need library(plyr) somewhere. If you just want base R, you could try

do.call(rbind, lapply(x, function (x) summary(x[, Inflow], na.rm=T)))

Where the lapply applies your function (summary(x[, Inflow], na.rm=T)) to each of your dataframes, and do.call(rbind, ...) just joins all the summaries together into a single dataframe.

mathematical.coffee
  • 55,977
  • 11
  • 154
  • 194
  • Thank you for you fast answer! I tried to it your info into my dataset, but it doesn't work :/ Did I do it the right way? The plyr package is installed `directory <- "~/Desktop/Uni/paper/testdata/" setwd(directory) Inflowsummary <- function(directory, Inflow, id = 1:3) { filenames <- sprintf("%03d.csv", id) filenames <- paste(directory, filenames, sep="/", dec=",") ldf <- laply(filenames, read.csv) df=ldply(ldf, function (x) summary(x[, Inflow], na.rm=T)) do.call(rbind, lapply(x, function (x) summary(x[, Inflow], na.rm=T))) summary(df[, Inflow], na.rm = TRUE) }` – juke Jul 24 '15 at 15:19
  • You need to be more specific. What does "it doesn't work" mean, and what specifically do you want to happen? – mathematical.coffee Jul 25 '15 at 00:08