1

I have followed this example Remove last N rows in data frame with the arbitrary number of rows but it just deletes only the last 50 rows of the data frame rather than the last 50 rows of every study site within the data frame. I have a really big data set that has multiple study sites and within each study site there's multiple depths and for each depth, a concentration of nutrients.

I want to just delete the last 50 rows of depth for each station.

E.g. station 1 has 250 depths station 2 has 1000 depths station 3 has 150 depth

but keep all the other data consistent.

This just seems to remove the last 50 from the dataframe rather than the last 50 from every station...

 df<- df[-seq(nrow(df),nrow(df)-50),]

What should I do to add more variables (study site) to filter by?

L55
  • 117
  • 8

2 Answers2

2

A potential base R solution would be:

d <- data.frame(station = rep(paste("station", 1:3), c(250, 1000, 150)),
                depth = rnorm(250 + 1000 + 150, 100, 10))

d$grp_counter <- do.call("c", lapply(tapply(d$depth, d$station, length), seq_len))
d$grp_length <- rep(tapply(d$depth, d$station, length), tapply(d$depth, d$station, length))
d <- d[d$grp_counter <= (d$grp_length - 50),]
d

# OR w/o auxiliary vars: subset(d, select = -c(grp_counter, grp_length))
r.user.05apr
  • 5,356
  • 3
  • 22
  • 39
1

we can use slice function from dplyr package

df2<-df %>% group_by(Col1) %>% slice(1:(n()-4))

At first it groups by category column and if arranged in proper order it can remove last n number of rows (in this case 4) from dataframe for each category.

Harshal Gajare
  • 605
  • 4
  • 16
  • I get an error that says Error: Indices must be either all positive or all negative, not a mix of both. Found 1 positive indices and 9 negative indices – L55 Jun 16 '20 at 13:06
  • can you share sample data by providing output of dput(head(df, 50)) ? if i use data from @r.user.05apr then i am getting it correctly. – Harshal Gajare Jun 16 '20 at 13:11
  • yes I get a correct answer with r.user.05apr but I get the error with df2<-df %>% group_by(Col1) %>% slice(1:(n()-4)) – L55 Jun 16 '20 at 13:19
  • you can use this code: d2<-d %>% group_by(station) %>% slice(1:(n()-100)) – Harshal Gajare Jun 16 '20 at 13:20
  • I still get the same error: Error: Indices must be either all positive or all negative, not a mix of both. Found 1 positive indices and 16 negative indices – L55 Jun 16 '20 at 13:51