R, which row value contains the most same column values

Question

Hello I have data set like this.

Age  Sallary  
24   >50k  
17   <=50k  
31   >50k  
24   >50k

I need to find the age which has the most >50k sallary

Just do `aggregate(Age~Sallary, df1, FUN = max)` or it can be `table(df1)` — akrun, Mar 28 '17 at 15:39
Hello I am begginer with R. I think I cant do that because I am using data.frame — LucasPG, Mar 28 '17 at 15:45

lmo · Accepted Answer · 2017-03-28T16:18:08.667

1

going with akrun's table comment,

names(which.max(table(df)[, ">50k"]))
[1] "24"

table calculates the cross-tab of these two columns. [, ">50K"] subsets to the column of salaries you are looking for, then which.max pulls out the first element of this column that contains the maximum count. Finally, since a named vector is returned by each of these functions, we can extract the age with names.

With a data.frame with additional columns, you could replace table(df) with table(df$Age, df$Sallary) to select these variables from the data.frame.

so

names(which.max(table(df$Age, df$Sallary)[, ">50k"]))
[1] "24"

also works for the example dataset.

data

df <- 
structure(list(Age = c(24L, 17L, 31L, 24L), Sallary = structure(c(2L, 
1L, 2L, 2L), .Label = c("<=50k", ">50k"), class = "factor")), .Names = c("Age", 
"Sallary"), class = "data.frame", row.names = c(NA, -4L))

edited Mar 28 '17 at 16:18

answered Mar 28 '17 at 15:53

lmo

37,904
9
56
69

I have the data loaded from csv file it contains 15 columns. I think I should call my variables as df$name etc.. Your example unfortunately didnt solve my problem – LucasPG Mar 28 '17 at 16:00
Unfortunately, without a representative dataset, there's not much to done there. Please take a look at these tips on how to produce a [minimum, complete, and verifiable example](http://stackoverflow.com/help/mcve), as well as this post on [creating a great example in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lmo Mar 28 '17 at 16:04
Maybe it will help, when I do table(dane$name,dane$salary == ">50K") it gives me table of all agase with boolean sallary values, I need to take out this one which contains the most true values. – LucasPG Mar 28 '17 at 16:12
Thank You for Your help, now I am getting an error "subscript out of "bounds – LucasPG Mar 28 '17 at 16:23
1

Oh, I solve it, the letter "K" needs to be big :). Thank You again for Your time. – LucasPG Mar 28 '17 at 16:28

R, which row value contains the most same column values

1 Answers1