35

This might be an easy question but i still need some help for using R.

I have a data.frame (main_data), lets say..

NAMES   AGE     LOC
Jyo     23      Hyd
Abid    27      Kar
Ras     24      Pun
Poo     25      Goa
Sus     28      Kar

I wish to remove a few rows based on a list of names. So lets say I have another list of table as follows:

NAMES_list
Jyo
Ras
Poo

So based on this list, if any of the names match to my above "main_data" table, then I would like to remove the whole row contianing them, so the result should be as follows

NAMES   AGE     LOC
Abid    27      Kar
Sus     28      Kar

Can anyone help me how to achive this using R? Thanks in advance.. :)

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
Letin
  • 1,255
  • 5
  • 20
  • 36
  • I had this same task, but had names in format "Last, First". For those of you with similar formatting, you may find that removing spaces from names may be necessary to get code in answers below to work. `gsub(" ","",x)` did the trick for me. – Pake Sep 04 '18 at 18:30

4 Answers4

62

Use %in%:

main_data2 <- main_data[ ! main_data$NAMES %in% NAMES_list, ]
January
  • 16,320
  • 6
  • 52
  • 74
13

If, by chance, you actually have a data.table (as opposed to a data.frame), and your data.table has a key, you can use the not join idiom

library(data.table)
dat <- as.data.table(read.table(text="
NAMES   AGE     LOC
Jyo     23      Hyd
Abid    27      Kar
Ras     24      Pun
Poo     25      Goa
Sus     28      Kar", 
stringsAsFactors=FALSE, header=TRUE))

setkey(dat, NAMES)

to.remove <- c("Jyo","Ras","Poo")
dat[-dat[to.remove, which=TRUE]]
#   NAMES AGE LOC
#1:  Abid  27 Kar
#2:   Sus  28 Kar

Of course, the other two answers would also work on a data.table, but this should be more efficient.


Edit

As of data.table version 1.8.3, the "!" prefix can be used for "not-joins" (see NEWS).

dat[!to.remove]
   NAMES AGE LOC
1:  Abid  27 Kar
2:   Sus  28 Kar
Community
  • 1
  • 1
GSee
  • 48,880
  • 13
  • 125
  • 145
  • 2
    +1 We really need proper not joins working don't we: `dat[-to.remove]`. It's actually quite easy to implement internally but I just haven't got to it yet :( It's [FR#1384](https://r-forge.r-project.org/tracker/index.php?func=detail&aid=1384&group_id=240&atid=978). – Matt Dowle Oct 22 '12 at 14:29
9

Replicate your data:

dat <- read.table(text="
NAMES   AGE     LOC
Jyo     23      Hyd
Abid    27      Kar
Ras     24      Pun
Poo     25      Goa
Sus     28      Kar", 
stringsAsFactors=FALSE, header=TRUE)

remove <- c("Jyo", "Ras", "Poo")

Simple subsetting:

dat[!dat$NAMES %in% remove, ]
  NAMES AGE LOC
2  Abid  27 Kar
5   Sus  28 Kar

Here's how it works: Use a combination of ! negation and %in% to return a logical vector that indicates the rows to keep:

!dat$NAMES %in% remove
[1] FALSE  TRUE FALSE FALSE  TRUE

I remember being surprised by this construct the first time I saw it. Why is it that !dat$NAMES returns anything useful? Well, of course the insight is that the infix operator %in% gets evaluated first, so the ! is simply a logical NOT operator.

Andrie
  • 176,377
  • 47
  • 447
  • 496
  • ...and it gets evaluated first because it has [precedence](http://stat.ethz.ch/R-manual/R-patched/library/base/html/Syntax.html) over `%any%`. – January Oct 22 '12 at 13:48
1

You can use also use match if there are unique values in your main_data$NAMES

NAMES_list <- c("Jyo","Ras","Poo")
main_data <- main_data[-match(NAMES_list,main_data$NAMES),]
main_data
  NAMES AGE LOC
2  Abid  27 Kar
5   Sus  28 Kar

It will remove the rows that exactly matches your NAMES_list with main_data$NAMES.

user1021713
  • 2,133
  • 8
  • 27
  • 40