0

I have a problem where I must delete certain columns in an Excel file depending on some cell values using R.

  1. I imported the Excel file

    data = as.matrix(read.csv('data.csv', header = FALSE))
    
  2. Dimensions of the matrix are: dim(data) = 106474 * 81

  3. What I need to do

    • If a value in any of the cells [row = 1, column = i] = 'A' -> delete column
    • If a value in any of the cells [row = 1, column = i] = 'B' -> delete column
    • If a value in any of the cells [row = 2, column = i] > 30 -> delete column
  4. What I did so far

    for (i in 1:ncol(data)) { 
      if (data[1,i]=='A') {
        data =  data [,-i]
      } else if (data[1,i]=='B') {
        data =  data [,-i]
      } else if (data[2,i] > 30) {
        data =  data [,-i]
      } 
    }
    
  5. Problems I ran into are as follows:

    1. When I read the matrix to r, it read everything as characters? how to convert data[2,i] row to numeric. I have provided how I converted it to numeric. But is there any better way?

      row2 = as.numeric(as.matrix(fly_data[2,])) 
      fly_data[2,] <- row2
      
    2. My for loop end abruptly and it doesn't run to the full length of

      ncol(data) 
      
    3. It always gives the

      Error in data[1, i] : subscript out of bounds

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
rbeginner
  • 43
  • 5
  • 1
    You will probably get more/better answers when you [make your question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – Wimpel Feb 19 '20 at 14:01
  • 2
    The file you are trying to read is a `.csv`? Then I would't call it an 'excel' file, this will only lead to confusion. As to the question why all your values are characters: A matrix can only hold **one** type of values, so if any of the values are strings the whole matrix will be cast as character. Use `data.frame` or `data.table::data.table` for mixed types. Then you specify your for loop to go from 1 to `ncol(data)` but then within the loop remove columns from `data` - thats why you get the `subscript out of bounds` error. You could make a copy first and mutate that instead of the original. – dario Feb 19 '20 at 14:03
  • Thank you very much. It worked. 1. numeric issue was resolved by using jogo as.numeric(data[2,i]) > 30 2. Out of bound was resolved by running the loop using a copy of the data. So original matrix was running the for loop and new one was being mutated as @dario suggested. – rbeginner Feb 19 '20 at 14:49
  • 1
    `i <- data[1,]=='A' | data[1,]=='B' | (as.numeric(data[2,]) > 30); datanew <- data[-i]` without a loop. – jogo Feb 19 '20 at 14:59

1 Answers1

0

Maybe you can try the code below

r <- subset(data, select = -which(as.character(unlist(data[1,]))%in% c("A","B") | as.numeric(as.character(unlist(data[2,])))>30))

such that

> r 
   X2 X3 X7
r1  D  C  C
r2  6 24 25

DATA

data <- structure(list(X1 = structure(2:1, .Names = c("r1", "r2"), .Label = c("7", 
"A"), class = "factor"), X2 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("6", "D"), class = "factor"), X3 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("24", "C"), class = "factor"), X4 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("13", "A"), class = "factor"), X5 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("27", "B"), class = "factor"), X6 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("17", "A"), class = "factor"), X7 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("25", "C"), class = "factor"), X8 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("35", "C"), class = "factor"), X9 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("13", "B"), class = "factor"), X10 = structure(2:1, .Names = c("r1", 
"r2"), .Label = c("27", "B"), class = "factor")), class = "data.frame", row.names = c("r1", 
"r2"))

> data
   X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
r1  A  D  C  A  B  A  C  C  B   B
r2  7  6 24 13 27 17 25 35 13  27
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81