3

I would like to delete a row from a data frame and sum the resulting columns. I know the row I want to delete based on its contents, but not its row number. Below I present three examples, two of which work. Using - to delete the row only works if the first row is to be deleted. Why is that?

My question is similar to this one: How to delete the first row of a dataframe in R? However, there the row is deleted based on its row number.

# This works.

state = 'OH'

my.data = read.table(text = "
      county  y1990 y2000
        cc       NA    2
        OH       NA   10
        bb       NA    1
", sep = "", header = TRUE, na.strings = "NA", stringsAsFactors = FALSE)

my.colsums2 <- colSums(my.data[!(my.data$county == state), 2:ncol(my.data)], na.rm=TRUE)
my.colsums2

# y1990 y2000 
#    0     3

# This works.

my.data = read.table(text = "
      county  y1990 y2000
        OH       NA   10
        cc       NA    2
        bb       NA    1
", sep = "", header = TRUE, na.strings = "NA", stringsAsFactors = FALSE)

my.colsums2 <- colSums(my.data[-(my.data$county == state), 2:ncol(my.data)], na.rm=TRUE)
my.colsums2

# y1990 y2000 
#    0     3

# This does not work.

my.data = read.table(text = "
      county  y1990 y2000
        cc       NA    2
        OH       NA   10
        bb       NA    1
", sep = "", header = TRUE, na.strings = "NA", stringsAsFactors = FALSE)

my.colsums2 <- colSums(my.data[-(my.data$county == state), 2:ncol(my.data)], na.rm=TRUE)
my.colsums2

# y1990 y2000 
#    0    11

I guess I am still confused over the difference between ! and -. Thank you for any advice.

Community
  • 1
  • 1
Mark Miller
  • 12,483
  • 23
  • 78
  • 132
  • I think this is what you're looking for: `colSums(my.data[my.data$county != "OH", -1], na.rm = TRUE)` – Arun Apr 02 '13 at 22:38
  • 1
    Actually, the two last examples are wrong, the middle one is working by luck. The negation of a boolean variable is obtained with `!`, not with `-`. – Ferdinand.kraft Apr 02 '13 at 22:41

2 Answers2

6

This should clear up the difference between - and !, and I suspect you can take it from there ;)

my.data$county == state
# [1]  TRUE FALSE FALSE

!(my.data$county == state)
# [1] FALSE  TRUE  TRUE

-(my.data$county == state)
# [1] -1  0  0

!, which negates Boolean values, is the operator that you should be using here.

Josh O'Brien
  • 159,210
  • 26
  • 366
  • 455
  • 1
    Namely, `-` applied to booleans first converts them to integers (0's and 1's) and then changed their sign. – joran Apr 02 '13 at 22:40
  • With the third data set, your third line is: [1] 0 -1 0 I am still not clear why -1 0 0 allows the desired result, but 0 -1 0 does not. I will think about it more. Thank you for the answer. – Mark Miller Apr 02 '13 at 22:44
  • Yes, thanks @joran. `-X` is **literally** treated as `-1 * X`, and during its evaluation, `logical` values are converted to `numeric` (just as when doing `X + 0`, etc.). As an interesting side note, compare the results of `+c(TRUE, FALSE)` and `-c(TRUE, FALSE)`. – Josh O'Brien Apr 02 '13 at 22:46
  • 1
    @MarkMiller -- It's just a coincidence that it works. Perhaps trying this will make that clearer: `-(c("cc", "bb", "OH", "OH", "bb") == "OH") `. – Josh O'Brien Apr 02 '13 at 22:48
  • Negating a boolean = bad business and should be avoided. I wonder if this would be a good question to introduce "good habits" (ie, unit testing) into R programming. – Brandon Bertelsen Apr 02 '13 at 22:56
  • @BrandonBertelsen -- Why is negating a Boolean bad? Just to be clear, are you saying one shouldn't use `!`? – Josh O'Brien Apr 02 '13 at 23:14
  • 2
    I'm having a giggle here because I'm about to say "not !". I think it's bad practice to negate "-" (negative) a logical value. `-TRUE == -1`, not `FALSE`. `!LOGICAL = GOOD`, `-LOGICAL = BAD`, or at least not intuitive. How many more times can I use not! – Brandon Bertelsen Apr 02 '13 at 23:26
3

I think it's important to remember what you're doing. When you pass a conditional argument to subset a row or column, it needs to be a full length TRUE or FALSE test or, it needs to be numbers that represent the row (or column).

Here's a simple example with a vector. Try entering the conditions into the console to see what they provide

Try these:

x <- rnorm(20)

## These use integer values for indexing
x[which(x > 1)]  # Numbers > Only those numbers which match

## These use logical values for indexing
x[x > 1]    # Logical > Only those that are true
x[!(x < 1)] # Logical > Only those that are false

Bad Behaviour:

x[-which(x > 1)] # Positive numbers to negative numbers = BAD
x[!which(x > 1)] # Converts numbers to logical = BAD
x[-(x > 1)] # Converts logical to numeric = BAD

Specific to your example:

!(my.data$county == state) # Converts TRUE/FALSE to FALSE/TRUE
which(my.data$county != state) # Rows where my.data$count not equal state

Personally, I recommend using which() in all cases to avoid potential negation of a logical or conversion of numeric. It also tends to be easier to "translate"

Brandon Bertelsen
  • 43,807
  • 34
  • 160
  • 255
  • Thank you for the answer. Please consider adding the recommended which statement to obtain the desired result with the original example. I can probably post one soon, but it might not be optimal. – Mark Miller Apr 02 '13 at 23:07
  • That's what we're trying to get across with these answers. One creates a vector of numbers (rows) the other produces a logical vector. The right one depends on the situation. You wouldn't be able to say `!which(cond)` because you're mixing two different types of variables. Just like you shouldn't be saying `-(x > 1)` – Brandon Bertelsen Apr 02 '13 at 23:13
  • +1 I really like this answer. (Also, please feel free to roll back the edits I just made if you don't want 'em.) – Josh O'Brien Apr 02 '13 at 23:19
  • 2
    Let's warn against using `x[-which(x > 1)]` (i.e. `-which`) as it is a recipe for disaster. See what happens if `x <- c(0, 0)` for example. So I would recommend quite the opposite: never use `which`. – flodel Apr 02 '13 at 23:52
  • Looks like I'm complaining about my own mistake here. Good catch @flodel – Brandon Bertelsen Apr 03 '13 at 00:03