0

I have dataframe called lexico which has a dimension of 11293x512.

I'd like to purge every row and column if any element in that column or row holds negative value.

How could I do this?

Following is my code that I tried but it takes too long time to run since it has nested loop structure.

(I was about to first get every column number that holds neg value in it)

colneg <- c()

for(i in 1:11293){
  for(j in 1:512){
    if(as.numeric(as.character(lexico[1283,2]))< 0)
      colneg <- c(colneg, j)
  }
}

It would be appreciate for your harsh advice for this novice.

Jaap
  • 81,064
  • 34
  • 182
  • 193
snapper
  • 997
  • 1
  • 12
  • 15

1 Answers1

1

A possible solution:

# create an index of columns with negative values
col_index <- !colSums(d < 0)

# create an index of rows with negative values
row_index <- !rowSums(d < 0)

# subset the dataframe with the two indexes
d2 <- d[row_index, col_index]

What this does:

  • colSums(d < 0) gives a numeric vector of the number of negative values in the columns.
  • By negating it with ! you create a logical vector where for the columns with no negative values get a TRUE value.
  • It works the same for rows.
  • Subsetting the dataframe with the row_index and the col_index gives you a dataframe where the rows as wel as the columns where the negative values appeared are removed.

Reproducible example data:

set.seed(171228)
d <- data.frame(matrix(rnorm(1e4, mean = 3), ncol = 20))
Jaap
  • 81,064
  • 34
  • 182
  • 193
  • you mean d the dataframe in this case lexico? – snapper Dec 28 '17 at 10:06
  • @Coincidence_Alpha yes – Jaap Dec 28 '17 at 10:06
  • it looks like you don't creat an index with negative, but positive I guess – snapper Dec 28 '17 at 10:07
  • @Coincidence_Alpha it creates indexes of which columns/rows to keep; added an explanation now – Jaap Dec 28 '17 at 10:14
  • I don't know why, but for the both cases of column&row, it returns all NA for col_index typing in console – snapper Dec 28 '17 at 10:34
  • @Coincidence_Alpha What did you type? Did you try the example I provided? – Jaap Dec 28 '17 at 10:37
  • I typed: col_index <- !colSums(lexico < 0) ; returns There were 50 or more warnings (use warnings() to see the first 50) row_index <- !rowSums(lexico_mat < 0); no error lexico <- lexico[row_index, col_index]; Error in `[.data.frame`(lexico, row_index, col_index) : undefined columns selected – snapper Dec 28 '17 at 10:44
  • @Coincidence_Alpha you use `lexico` for the columns and `lexico_mat` for the rows, maybe that's the problem? If not, please include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) in your question – Jaap Dec 28 '17 at 10:48
  • the problem is simple. "lexico < 0 " this operation keep returns NA – snapper Dec 28 '17 at 10:51
  • @Coincidence_Alpha Maybe your data is not numeric? – Jaap Dec 28 '17 at 11:02
  • @Coincidence_Alpha It might indeed be a simple problem, but unless you don't include an example that reproduces this problem the only thing I can do is guessing what might be the problem (as I did in my previous comment). – Jaap Dec 28 '17 at 11:03
  • How to check whether it's numeric or not? in csv, they look just number. – snapper Dec 28 '17 at 11:12
  • ID Feature1 Feature2 ... 1063350.669 935856 37038720 0 30800 0 0 0 0 1063369 0 40320 0 0 0 0 0 0 1063387 0 0 0 0 0 0 8400 0 1063400 0 0 0 0 0 0 0 0 1063412.837 -61920 222240 -52800 0 0 0 0 0 1063429.279 1845600 552000 0 0 0 0 0 0 ... this is how it looks like in part – snapper Dec 28 '17 at 11:14
  • @Coincidence_Alpha It is better (and easier) to include a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610) in the question. To check whether columns are numeric or not, you can use `str(lexico)` or `sapply(lexico, class)` – Jaap Dec 28 '17 at 11:18