bridging together lapply function with multiple csv files

Question

I have multiple files that I want to use the same scripting on. I'm struggling with how exactly to "link" together the lapply function with the scripts I want to use.

It's an extension of applying R script prepared for single file to multiple files in the directory

filenames<-list.files("NEWELKYR",pattern="*.csv",full.names=T)

mycsv=dir(pattern=".csv")
n<-length(mycsv)
mylist<-vector("list",n)
for(i in 1:n) mylist[[1]] <- read.csv(mycsv[i])

mylist<-lapply(mylist,function(x) #what do I put here?#

GROUP[1] <- 1
Xdist[1] <- XLOC[2] - XLOC[1]
Ydist[1] <- YLOC[2] - YLOC[1]
NSD[1]   <- as.integer(sqrt(Xdist[1]^2+Ydist[1]^2))
for ( j in 2:(nrow()-1)) {
  if ( NSD[j-1] > 1700) {
    Xdist[j] <- XLOC[j+1] - XLOC[j]
    Ydist[j] <- YLOC[j+1] - YLOC[j]
    NSD[j]   <- as.integer(sqrt(Xdist[j]^2+Ydist[j]^2))
    GROUP[j] <- (GROUP[j-1] + 1)
  } else {
    Xdist[j] <- XLOC[j+1] - XLOC[j] + Xdist[j-1]
    Ydist[j] <- YLOC[j+1] - YLOC[j] + Ydist[j-1]
    NSD[j]   <- as.integer(sqrt(Xdist[j]^2+Ydist[j]^2))
    GROUP[j] <- (GROUP[j-1])    
  }}
)


for(i in 1:n)
  write.csv(file=paste("file",i,".csv",sep="")),
  mylist[i],row.names=F)

Background info about the scripting can be found here: calculating Net Squared Displacement and repeating at 0 when target is reached

Are you trying to calculate pairwise distances? Maybe the `dist` function will help you here. — Mike.Gahan, Jan 22 '15 at 20:06
No, each csv file contains locations of an individual animal. I'm coding it to calculate net squared displacement until 1700m is attained then start the calculation over again. — odocoileus, Jan 22 '15 at 20:14
GROUP represents each group of locations for each 1700m NSD. For instance, say it took 40 locations to exceed 1700m, all these locations will be assigned as GROUP 1. Then if it was locations 41-92 until 1700m was exceeded, it'd be GROUP 2. Hope that made sense. — odocoileus, Jan 22 '15 at 20:27

Mike.Gahan · Accepted Answer · 2015-01-23T18:21:50.420

Ok. First I have some sample data:

data <- read.table(header=TRUE, text="
       X       Y AnimalID      DATE
1 550466 4789843       10 1/25/2008
2 550820 4790544       10 1/26/2008
3 551071 4791230       10 1/26/2008
4 550462 4789292       10 1/26/2008
5 550390 4789934       10 1/27/2008
6 550543 4790085       10 1/27/2008
")

Then I write it to a csv file:

write.csv(data, file="data.csv", row.names=FALSE)

Now I have a function that keeps resetting the origin if past a distance of 800.

read_march <- function(x){
  require(data.table)
  data <- fread(x)

  #Perform some quick data prep before entering animal march function
  data[, X.BEG := X[1L]]
  data[, Y.BEG := Y[1L]]
  data[, NOT.CHECKED := 1L]

      animal_march <- function(data){ 
          data[, NSD := sqrt((X.BEG-X)^2+(Y.BEG-Y)^2)]
          data[NOT.CHECKED==1L, CUM.VAL := cumsum(cumsum(NSD>800))]
          data[, X.BEG := ifelse(CUM.VAL>1L, data[CUM.VAL==1L]$X, X.BEG)]
          data[, Y.BEG := ifelse(CUM.VAL>1L, data[CUM.VAL==1L]$Y, Y.BEG)]
          data[, NOT.CHECKED := 1*(CUM.VAL>1L)]
          data[, CUM.VAL := 0L]

        if (data[, sum(NOT.CHECKED)]==0L){
          data[, GRP := .GRP, by=.(X.BEG,Y.BEG)] #Here, GRP is created
          return(data)
        } else {
          return(animal_march(data))
        }
      }

  result <- animal_march(data=data)
  return(result)
}

The next step is just to cycle through all of the csvs and apply our read and march function (we only have 1 csv here).

#Apply function to each csv file
library(data.table)
files = list.files(pattern="*.csv")
animal.csvs <- lapply(files, function(x) read_march(x))
big.animal.data <- rbindlist(animal.csvs) #Retrieve one big dataset

Here is the print-out:

> big.animal.data
        X       Y AnimalID      DATE  X.BEG   Y.BEG NOT.CHECKED       NSD CUM.VAL GRP
1: 550466 4789843       10 1/25/2008 550466 4789843           0    0.0000       0   1
2: 550820 4790544       10 1/26/2008 550466 4789843           0  785.3133       0   1
3: 551071 4791230       10 1/26/2008 550466 4789843           0 1513.2065       0   1
4: 550462 4789292       10 1/26/2008 551071 4791230           0 2031.4342       0   2
5: 550390 4789934       10 1/27/2008 550462 4789292           0  646.0248       0   3
6: 550543 4790085       10 1/27/2008 550462 4789292           0  797.1261       0   3

Notice how X.BEG and Y.BEG keep changing after the distance of 800 is exceeded.

Wow! That worked perfect!! I really appreciate you taking the time to assist. — odocoileus, Jan 23 '15 at 14:19
Hi Mike, curious how you would tweak the code to add an identifier column for each NSD group? Using your big.animal.data example, column "X" would be 1,1,1,2,3,3 which corresponds with the sequence of steps for each 800m NSD? — odocoileus, Jan 23 '15 at 15:53

score 0 · Answer 2 · answered Jan 22 '15 at 20:02

0

The apply functions are essentially nothing more than fancy for loops. In your example, you have a list of the matrices from your csv files.

lapply(mylist, function(x) ...)

This means for each element of your list (i.e. matrix/data.frame) is represented as x. Therefore, you can put your functions within brackets after the function(x). As a very simple example:

mat <- matrix(seq(9), ncol= 3)
mat1 <- matrix(seq(12), ncol=4)
mylist <- list(mat, mat1)
lapply(mylist, function(x) {
    nr <- nrow(x)
    nc <- ncol(x)
    return(c(nr, nc))
})

Obviously with this example I could have used dim but this demonstrates how you can have multiple lines within your lapply. However, I cannot give you much further information regarding your actual code. It isn't clear from your example script which object is your matrix/data.frame but this should get you started in the general direction.

answered Jan 22 '15 at 20:02

cdeterman

19,630
7
76
100

Everytime I try to enclose the lappy with a parathesis (lapply(mylist,function(x){...}), an error arises - "unexpected ")". – odocoileus Jan 22 '15 at 20:20
So I'd need to end each line with an (x)? Something like "Group[1]<-1(x)" and "Xdist[1]<-XLOC[2](x) - XLOC[1](x)"??? – odocoileus Jan 22 '15 at 20:21
You have an extra parentheses as the error tells you. Regarding your next comment, you really need to learn more R. The `x` is your matrix for the given loop, you don't just add it on. I assume things like XLOC are part of that matrix. You will need to assign variables appropriately and not just take the script for granted. You have more programming to do. – cdeterman Jan 22 '15 at 20:23
The codes minus the lappy worked perfectly on a single csv. I'm trying to figure out how to apply the codes to multiple CSVs using lappy. The lappy functions I find online are too simple (mathematical operations) and not helpful to me. – odocoileus Jan 22 '15 at 20:36
try putting `x$` at the start of your objects. You must have `attach`ed your dataset previously for it to work as is. For example change `GROUP[1] <- 1` to `x$GROUP[1] <- 1`. The scope of your loop is local and likely not going to find the objects otherwise. – cdeterman Jan 22 '15 at 20:39
The codes did work after adding x$ but the csv files don't show any changes. – odocoileus Jan 22 '15 at 21:04
Are you trying to overwrite files? It looks like you are creating new files (e.g. file1.csv). Are those files empty? Are they identical to the original? You can also check the list elements to see if they are changing as a result of the code? You also may want to have a different name for the result instead of overwriting your original list to avoid possible confusion. – cdeterman Jan 22 '15 at 21:13
The new files (eg file 1.csv) only have 1 cell filled out. Can't figure out what these values represent but certainly isn't what I"m looking for. Hopefully I figure it out. Thanks for your help! – odocoileus Jan 22 '15 at 21:27

bridging together lapply function with multiple csv files

2 Answers2