How would I find (and output) the position of the first value of 1 and the last value of 1 by row in a number of csv files at once?

Question

I am trying to output the position of the first value of 1 and the last value of 1 by row in a number of binary matrices stored in multiple csv files at once?

I have the following used to read in all tab-delimated csv files in the working directory...

csvs <- list.files(pattern="*.csv")
files <- lapply(csvs, read.delim)

First of all, I have tried...

first_1 <- sapply(files, function(x) min(which(x == 1)))

But this isn't given me the right answer. For example in a csv file with a binary matrix of

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   1   1   1   1   1   1   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   1   1   1   0   0   1   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   1   1   1   1   1   0   0   0   1   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   1   0   1   1   0   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   1   1   1   1   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   1   1   1   1   0   0   0   1   0   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   1   0   0   0   1   0   1   1   0   0   0   0   0   0

0   0   0   0   0   0   0   1   0   0   0   1   0   0   0   1   1   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   1   1   1   1   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0

the sapply is outputting 152 when it should be outputting 135. Can someone help?

50 x 50 data frame

dcarlson · Accepted Answer · 2019-10-17T15:14:02.870

You are reading the data and creating data frames not matrices. That could affect your results down the line, but not here since data frames and matrices are both processed by R column-wise so you are getting the correct answer to your question just not the one you want. The simplest approach is to use t(). I've created a data frame from your example called dta:

min(which(dta == 1))
# [1] 159
min(which(t(dta) == 1))
# [1] 135

Larger matrices work just fine (response to comment below). First create a reproducible matrix.

dta <- matrix(0, 50, 50)
ones <- structure(c(25L, 22L, 27L, 9L, 31L, 38L, 32L, 2L, 9L, 50L, 7L, 
19L, 40L, 47L, 26L, 1L, 47L, 34L, 16L, 23L, 39L, 3L, 30L, 50L, 
11L, 3L, 41L, 28L, 22L, 15L, 50L, 31L, 28L, 38L, 16L, 25L, 14L, 
22L, 12L, 11L, 40L, 44L, 1L, 38L, 7L, 39L, 1L, 39L, 33L, 50L, 
16L, 15L, 4L, 37L, 25L, 25L, 18L, 9L, 21L, 32L, 47L, 49L, 17L, 
48L, 26L, 7L, 4L, 47L, 16L, 11L, 35L, 17L, 25L, 23L, 24L, 4L, 
12L, 23L, 8L, 38L, 19L, 32L, 8L, 35L, 1L, 48L, 42L, 45L, 43L, 
45L, 30L, 41L, 5L, 5L, 49L, 37L, 19L, 20L, 48L, 43L), .Dim = c(50L, 
2L), .Dimnames = list(NULL, c("row", "col")))
dta[ones] <- 1
dim(dta)  # Show the number of rows and columns
# [1] 50 50

You can browse the matrix with View(dta) before you use the following code:

min(which(dta == 1))  # By columns
# [1] 16
min(which(t(dta) == 1))  # By rows
# [1] 5

I have tried t() on smaller data frames and this works well! But when I try it on larger data frames such as 50 x 50 I am still getting incorrect results. For example I have included a link to a screenshot underneath my question above. This one is coming up as the position of the first value is 167 when it is clearly like 217. — Arron, Oct 17 '19 at 08:02
This is what I have tried @dcarlson... 'csvs <- list.files(pattern="*.csv") ## Reading in all files in working directory. files <- lapply(csvs, read.delim) top <- sapply(files, function(x) min(which(t(x) == 1))) top bottom <- sapply(files, function(x) max(which(t(x) == 1))) bottom' — Arron, Oct 17 '19 at 09:43
It should not matter how big the matrix is (within the limits of your computer memory). I can't do anything with a picture of your data so it is not helpful. Try individual files, e.g. `min(which(t(files[[1]]) == 1))`. — dcarlson, Oct 17 '19 at 15:48

How would I find (and output) the position of the first value of 1 and the last value of 1 by row in a number of csv files at once?

1 Answers1