64

I have a data frame and I want to remove last N rows from it. If I want to remove 5 rows, I currently use the following command, which in my opinion is rather convoluted:

df<- df[-seq(nrow(df),nrow(df)-4),]

How would you accomplish task, is there a convenient function that I can use in R?

In unix, I would use:

tac file | sed '1,5d' | tac 
zx8754
  • 52,746
  • 12
  • 114
  • 209
Alby
  • 5,522
  • 7
  • 41
  • 51

4 Answers4

102

head with a negative index is convenient for this...

df <- data.frame( a = 1:10 )
head(df,-5)
#  a
#1 1
#2 2
#3 3
#4 4
#5 5

p.s. your seq() example may be written slightly less(?) awkwardly using the named arguments by and length.out (shortened to len) like this -seq(nrow(df),by=-1,len=5).

Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • 1
    There's an edge case! `head(df, -0) == head(df,0) != df` – peer Nov 23 '18 at 13:41
  • @peer sorry, I don't think I understand your comment. Can you illustrate the edge case more fully? – Simon O'Hanlon Nov 23 '18 at 15:41
  • 4
    I'm switching from `df[0:(nrow(df)-n),]` to `head`. In my case the user moves a slider to indicate `n` last rows are to be removed. But there's a catch! When the user sets `n=0` we would expect no rows to be removed. But with `head(df, -n)` all rows are removed because negative zero is resolved to positive zero -> take the first 0 rows. So I want to warn others who set `n` dynamically and allow `n=0`: You'll need `if (n > 0) df=head(df, -n)` – peer Nov 23 '18 at 17:33
27

This one takes one more line, but is far more readable:

n<-dim(df)[1]
df<-df[1:(n-5),]

Of course, you can do it in one line by sticking the dim command directly into the re-assignment statement. I assume this is part of a reproducible script, and you can retrace your steps... Otherwise, strongly recommend in such cases to save to a different variable (e.g., df2) and then remove the redundant copy only after you're sure you got what you wanted.

Assaf
  • 525
  • 5
  • 6
23

Adding a dplyr answer for completeness:

test_df <- data_frame(a = c(1,2,3,4,5,6,7,8,9,10), 
                      b = c("a","b","c","d","e","f","g","h","i","j"))
slice(test_df, 1:(n()-5))

## A tibble: 5 x 2
#      a b    
#  <dbl> <chr>
#1     1 a    
#2     2 b    
#3     3 c    
#4     4 d    
#5     5 e    
Oscar
  • 349
  • 3
  • 5
20

Another dplyr answer which is even more readable:

df %>% filter(row_number() <= n()-5)
Edgar
  • 412
  • 2
  • 6
  • 15