Drop all rows besides the largest number per observation in R

Question

I am trying to merge two datasets for my senior thesis on corporate political actibity. One shows all of the data I have on each company, which is made up off several previously merged datasets, and the other shows the year, the companies' ticker, and a variable called "dirnbr". "dirnbr" shows how many people were on the board in a given year, except it is showing it like this:

Basically, it is creating several entries per year, one for each person on the board, going from 1 to the total number on the board (which is the only number I really care about). I just want my dataset to show total number of people on the board in a given year, year, and ticker. This would then allow me to merge them using an inner_join command and then see what percentage of people on a board of directors in a given year were formerly involved in politics. (I have that information in my larger dataset).

Basically, I would like to drop every observation besides the largest "dirnbr" entry per year and ticker. Is there a way to do this (or achieve the same result in another way?)?

Please let me know, any help is very appreciated.

Please don't post data as images. Take a look at how to make a [great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for ways of showing data. The gold standard for providing data is using `dput(head(NameOfYourData))`, *editing* your question and putting the `structure()` output into the question. — Martin Gal, Oct 03 '21 at 15:02

Martin Gal · Answer 1 · 2021-10-03T15:07:54.257

0

You could use

library(dplyr)

df %>%
  group_by(ticker, year) %>%
  filter(dirnbr == max(dirnbr))

or

df %>%
  group_by(ticker, year) %>%
  slice_max(dirnbr)

edited Oct 03 '21 at 15:07

answered Oct 03 '21 at 15:01

Martin Gal

16,640
5
21
39

Drop all rows besides the largest number per observation in R

1 Answers1