Find the second highest value in data frame in R

Question

Hello for below data frame in R, may I know the simplest command (without using any additional library like deplyr) how to find the second highest salary and store the name of the employee in a variable named 2nd_high_employee?

EmployeeID  EmployeeName    Department      Salary   
----------- --------------- --------------- ---------
1           T Cook          Finance         40000.00
2           D Michael       Finance         25000.00
3           A Smith         Finance         25000.00
4           D Adams         Finance         15000.00
5           M Williams      IT              80000.00
6           D Jones         IT              40000.00
7           J Miller        IT              50000.00
8           L Lewis         IT              50000.00
9           A Anderson      Back-Office     25000.00
10          S Martin        Back-Office     15000.00
11          J Garcia        Back-Office     15000.00
12          T Clerk         Back-Office     10000.00

Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. — Sotos, Jul 03 '18 at 06:58

Lennyy · Answer 1 · 2018-07-03T07:43:29.623

Next time you could consider to post a sample of your data using head(dput(x)), to ease SO members to read in your data.

df <- read.table(text = "
EmployeeID  EmployeeName    Department      Salary   

1           T Cook          Finance         40000.00
2           D Michael       Finance         25000.00
3           A Smith         Finance         25000.00
4           D Adams         Finance         15000.00
5           M Williams      IT              80000.00
6           D Jones         IT              40000.00
7           J Miller        IT              50000.00
8           L Lewis         IT              50000.00
9           A Anderson      Back-Office     25000.00
10          S Martin        Back-Office     15000.00
11          J Garcia        Back-Office     15000.00
12          T Clerk         Back-Office     10000.00", header = T)


second_high_employee <- tail(sort(df$Salary),2)[1]
second_high_employee 
[1] 50000

BTW, it is not possible to start an object name with a number. You could check: ?make.names

Also, for for each department you could do:

aggregate(Salary ~ Department, df, function(x) {tail(sort(x), 2)[1]})
   Department Salary
1 Back-Office  15000
2     Finance  25000
3          IT  50000

In case there had been 2 top salaries of 80000 and you had wanted to find the second highest of 50000 again, you could have wrapped x or df$Salaray inside tail(sort(unique()), 2)[1]

Thanks a lot for the response, it helps, however considering there is only 1 employee with 50K as a salary, how would I print his name considering I use your command to fetch the second highest salary please? — dipincoolwater, Jul 03 '18 at 08:13
You could add `$EmployeeName` to this if you only want to print the name — Lennyy, Jul 03 '18 at 08:15
Gives me the warning for NA values: Warning message: In is.na(x) : is.na() applied to non-(list or vector) of type 'NULL' [1] emp_id emp_name salary start_date <0 rows> (or 0-length row.names) — dipincoolwater, Jul 03 '18 at 09:05
What does `df[df$Salary %in% tail(sort(df$Salary),2)[1],]` give? — Lennyy, Jul 03 '18 at 09:14

Onyambu · Answer 2 · 2018-07-03T09:19:42.297

1

Using Base R: Finding the 2nd highest salary:

if you need the subset without taking into consideration the department:

subset(dat,sort(z<-rank(Salary),T)[2]==z)
  EmployeeID EmployeeName Department Salary
7          J       Miller         IT  50000
8          L        Lewis         IT  50000

if taking into consideration the department:

unsplit(by(dat,dat$Department,function(x)subset(x,(y<-rank(Salary))==sort(y,T)[2])),rep(1:3,each=2))   

   EmployeeID EmployeeName  Department Salary
10          S       Martin Back-Office  15000
11          J       Garcia Back-Office  15000
2           D      Michael     Finance  25000
3           A        Smith     Finance  25000
7           J       Miller          IT  50000
8           L        Lewis          IT  50000

Just for the employee name:

as.character(subset(dat,sort(z<-rank(Salary),T)[2]==z)[,2])
[1] "Miller" "Lewis"

edited Jul 03 '18 at 09:19

answered Jul 03 '18 at 07:39

Onyambu

67,392
3
24
53

subset(dat,sort(z<-rank(Salary),T)[2]==z) gives the complete row, I just want employee name as a final output..is there a way? – dipincoolwater Jul 03 '18 at 09:11
Thanks a lot..Cheers!! – dipincoolwater Jul 03 '18 at 09:27
lets say I wanna add the salaries of employees in IT department only..what woudl be the query? – dipincoolwater Jul 03 '18 at 16:14

Find the second highest value in data frame in R

2 Answers2