0

I am trying to eliminate all rows in excel that have he following features:

  1. First column is an integer
  2. Second column begins with an integer
  3. Third column is empty

The code I have written appears to run indefinitely. CAS.MULT is the name of my dataframe.

for (i in 1:nrow(CAS.MULT)) {

  testInteger <- function(x) {
    test <- all.equal(x, as.integer(x), check.attributes = FALSE)
    if (test == TRUE) {
      return (TRUE)
    }
    else {
      return (FALSE)
    }
  }

  if (testInteger(as.integer(CAS.MULT[i,1])) == TRUE) {  
    if (testInteger(as.integer(substring(CAS.MULT[i,2],1,1))) == TRUE) {
      if (CAS.MULT[i,3] == '') {
        CAS.MULT <- data.frame(CAS.MULT[-i,])
      }
    }
  }
}
shellter
  • 36,525
  • 7
  • 83
  • 90
user41569
  • 31
  • 1
  • 3
  • 1
    It would be easier to help you if you post a sample of your data frame. For example, paste into your question the output of `dput(head(CAS.MULT))`. – eipi10 May 26 '15 at 15:28
  • Alcohols C12-C14 ethoxylated NONYL PHENOL 4 MOL Alcohols C12-16 ethoxylated Bentonite benzyl(hydrogenated tallow alkyl) dimethylammonium stearate complex Diotomaceous Earth Calcined Crystalline silica quartz Diatomaceous Earth Calcined 2 2-Dibromo-3-nitrilo-propionamide (DBNPA) – user41569 May 26 '15 at 16:40
  • Sorry, not displaying the rows -- Alcohols C12-C14 ethoxylated ||| NONYL PHENOL 4 MOL ||| Alcohols C12-16 ethoxylated ||| Bentonite benzyl(hydrogenated tallow alkyl) dimethylammonium ||| 2 2-Dibromo-3-nitrilo-propionamide (DBNPA) ||| so, for example, that last row would need to go – user41569 May 26 '15 at 16:41
  • 1
    Please edit your original post to include the example data. Don't just put the data into a comment. – BenBarnes May 28 '15 at 09:46

2 Answers2

0

You should be very wary of deleting rows within a for loop, if often leads to undesired behavior. There are a number of ways you could handle this. For instance, you can flag the rows for deletion and then delete them after.

Another thing I noticed is that you are converting your columns to integers before passing them to your function to test if they are integers, so you will be incorrectly returning true for all values passed to the function.

Maybe something like this would work (without a reproducible example it's hard to say if it will work or not):

toDelete <- numeric(0)
for (i in 1:nrow(CAS.MULT)) {

testInteger <- function(x) {
    test <- all.equal(x, as.integer(x), check.attributes = FALSE)
    if (test == TRUE) {
      return (TRUE)
    }
    else {
      return (FALSE)
    }
  }  

  if (testInteger(CAS.MULT[i,1]) == TRUE) {

    if (testInteger(substring(CAS.MULT[i,2],1,1)) == TRUE) {

      if (CAS.MULT[i,3] == '') {

        toDelete <- c(toDelete, i)

      }

    }

  }

}

CAS.MULT <- CAS.MULT[-1*toDelete,]
C_Z_
  • 7,427
  • 5
  • 44
  • 81
  • The problem I was running into (which is why I added the as.integer() function) was that even integers respond as "False" when I run the testInteger function. So this way it would at least give an error for letters and a True for integers. – user41569 May 26 '15 at 16:28
0

Hard to be sure without testing my code on your data, but this might work. Instead of a loop, the code below uses logical indexing based on the conditions you specified in your question. This is vectorized (meaning it operates on the entire data frame at once, rather than by row) and is much faster than looping row by row:

CAS.MULT.screened = CAS.MULT[!(CAS.MULT[,1] %% 1 == 0 |  
                               as.numeric(substring(CAS.MULT[,2],1,1)) %% 1 == 0 |
                               CAS.MULT[,3] == ""), ]

For more on checking whether a value is an integer, see this SO question.

One other thing: Just for future reference, for efficiency you should define your function outside the loop, rather than recreating the function every time through the loop.

Community
  • 1
  • 1
eipi10
  • 91,525
  • 24
  • 209
  • 285