1

I have a series of functions in MATLAB that need to be converted into R. Unfortunately I do not know R all that well.

A major hurdle is loading csv 100 files that are each 50x86069 into a 100, 50, 86069 array.

I have the code set up to open/write an array, and then read each 50x86069 csv file as part of a loop.

l <- list.files(inputs)
data.array<-array(0,dim=c(100,50,86069))

# loop through the input files to get the data loaded into an array
for(i in 1:5)
in.file <- read.csv(paste(inputs,"/",l[[i]], sep = ""))

    in.file = in.file[,-1] ## remove the first column

Now I need to put in.file into data.array(i,50,86069).

Any help would be greatly appreciated!

Thanks-

  • A reproducible example will go a long way. http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Roman Luštrik Oct 09 '12 at 07:43
  • Is this question still up? If any of the answers solved your problem please mark it as answered (green check under the score) or provide further details about your desired output. – Roman Luštrik Oct 10 '12 at 07:59

2 Answers2

3

You can quite easily leverage the laply function from the plyr package to get the result you need:

list_csv = list.files("/path/to/csv_files/", pattern = "csv")
muli_dim_array = laply(list_csv, read.csv)

The laply function applies the function read.csv to the list list_csv and yields an array as result, hence the function name laply. See Hadley's JSS paper for more detail on plyr.

For an rbind like function that scales for multiple dimensions (>2), take a look at the abind function form the abind package. A solution using abind and lapply:

list_arrays = lapply(list_csv, read.csv)
n = length(dim(list_arrays[[1]]))
multi_dim_array = do.call("abind", list_arrays, along = n + 1)

This eliminates the need for plyr (but relies on abind :)), and might show different performance in terms of CPU time and RAM. Maybe some benchmarks could give some guidance in that case (also including the for loop based solution of @Roman).

At the end of the day I really like the short, to-the-point syntax of plyr, and I would first try that solution.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Paul and Roman - Many thanks for your insight. While both solutions worked well, I think Roman's might be better suited for my needs. One question that I have is using Roman's approach, how could I remove the first column out of each input ar variable? The second question is that the input data will need values greater than one to be 1 something like ar[ar>0] <- 1 I would imagine both of these would be applied in the same loop, but am not able to get this to hapen without errors. Thanks- – user1553041 Oct 15 '12 at 12:37
2

Are you looking for something like this?

> ar1 <- array(1:9, dim = c(3, 3))
> ar1
     [,1] [,2] [,3]
[1,]    1    4    7
[2,]    2    5    8
[3,]    3    6    9
> ar2 <- array(10:18, dim = c(3, 3))
> ar3 <- array(19:27, dim = c(3, 3))
> ar.list <- list(ar1, ar2, ar3)
> bigarray <- array(NA, dim = c(3, 3, 3))
> for (i in 1:3) {
+     
+     intr <- vector("list", 3)
+     for(j in 1:3) {
+         intr[[j]] <- ar.list[[j]][i, ]
+     }
+     bigarray[, , i] <- do.call("rbind", intr)
+ }
> bigarray
, , 1

     [,1] [,2] [,3]
[1,]    1    4    7
[2,]   10   13   16
[3,]   19   22   25

, , 2

     [,1] [,2] [,3]
[1,]    2    5    8
[2,]   11   14   17
[3,]   20   23   26

, , 3

     [,1] [,2] [,3]
[1,]    3    6    9
[2,]   12   15   18
[3,]   21   24   27
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197