Deleting specific rows with a pattern[start and end indicators] from a dataframe

Question

I want to remove all rows between "5. Demand Disputed" and "Total Demand Disputed" from their respective columns. I have tried

 grepl
 gsub

but not able to achieve the desire output.Kindly guide.

What do you mean by remove? Please try to clarify your problem and come up with a [reproducible example](https://stackoverflow.com/q/5963269/3250126) — loki, Aug 03 '17 at 12:14

score 2 · Answer 1 · answered Aug 03 '17 at 12:05

2

Use grep to create an index vector between the two lines.

x[-c(grep("5. Demand Disputed", x$V1) : grep("Total Demand Disputed", x$V1), ]

Explanation

grep " returns a vector of the indices of the elements of x that yielded a match" (?grep)

So, you can simply create an integer vector between the two lines that match the two strings by :.

answered Aug 03 '17 at 12:05

loki

9,816
7
56
82

`Error in x$V1 : $ operator is invalid for atomic vectors` – PritamJ Aug 03 '17 at 12:10
I cannot help you with that error, since you didn't add data. Please share a little of the data, so we have a reproducible example to work on. – loki Aug 03 '17 at 12:11
Thank you for your inputs, I got the solution. You were right about creating index vector. – PritamJ Aug 03 '17 at 12:20

score 2 · Accepted Answer · answered Aug 03 '17 at 12:09

2

Using a toy example...

df <- data.frame(a=LETTERS[1:10],b=LETTERS[3:12],stringsAsFactors = FALSE)
limits <- c("E","H")

sapply(df,function(x){
  del.min <- grep(limits[1],x)
  del.max <- grep(limits[2],x)
  x[del.min:del.max] <- ""
  return(x)})

      a   b  
 [1,] "A" "C"
 [2,] "B" "D"
 [3,] "C" "" 
 [4,] "D" "" 
 [5,] ""  "" 
 [6,] ""  "" 
 [7,] ""  "I"
 [8,] ""  "J"
 [9,] "I" "K"
[10,] "J" "L"

answered Aug 03 '17 at 12:09

Andrew Gustar

17,295
1
22
32

1

Thank You! I was wondering how to set index or put a indicator to remove it. Got my ans. – PritamJ Aug 03 '17 at 12:19
It seems that `grep` is not necessary. `which` should work fine – Sotos Aug 03 '17 at 12:33
@Sotos - I used `grep` because in the image there seems to be some extra text after `Total Demand Disputed`, so I think it is a more robust solution than `which`. – Andrew Gustar Aug 03 '17 at 12:38
1

I meant on your data set. But I agree about robustness – Sotos Aug 03 '17 at 12:41

Deleting specific rows with a pattern[start and end indicators] from a dataframe

2 Answers2