-1

I am trying to merge two datasets for my senior thesis on corporate political actibity. One shows all of the data I have on each company, which is made up off several previously merged datasets, and the other shows the year, the companies' ticker, and a variable called "dirnbr". "dirnbr" shows how many people were on the board in a given year, except it is showing it like this:

How it is currently in my dataset

Basically, it is creating several entries per year, one for each person on the board, going from 1 to the total number on the board (which is the only number I really care about). I just want my dataset to show total number of people on the board in a given year, year, and ticker. This would then allow me to merge them using an inner_join command and then see what percentage of people on a board of directors in a given year were formerly involved in politics. (I have that information in my larger dataset).

Basically, I would like to drop every observation besides the largest "dirnbr" entry per year and ticker. Is there a way to do this (or achieve the same result in another way?)?

Please let me know, any help is very appreciated.

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Please don't post data as images. Take a look at how to make a [great reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for ways of showing data. The gold standard for providing data is using `dput(head(NameOfYourData))`, *editing* your question and putting the `structure()` output into the question. – Martin Gal Oct 03 '21 at 15:02

1 Answers1

0

You could use

library(dplyr)

df %>%
  group_by(ticker, year) %>%
  filter(dirnbr == max(dirnbr))

or

df %>%
  group_by(ticker, year) %>%
  slice_max(dirnbr)
Martin Gal
  • 16,640
  • 5
  • 21
  • 39