Hello I have data set like this.
Age Sallary
24 >50k
17 <=50k
31 >50k
24 >50k
I need to find the age which has the most >50k sallary
Hello I have data set like this.
Age Sallary
24 >50k
17 <=50k
31 >50k
24 >50k
I need to find the age which has the most >50k sallary
going with akrun's table
comment,
names(which.max(table(df)[, ">50k"]))
[1] "24"
table
calculates the cross-tab of these two columns. [, ">50K"]
subsets to the column of salaries you are looking for, then which.max
pulls out the first element of this column that contains the maximum count. Finally, since a named vector is returned by each of these functions, we can extract the age with names
.
With a data.frame with additional columns, you could replace table(df)
with table(df$Age, df$Sallary)
to select these variables from the data.frame.
so
names(which.max(table(df$Age, df$Sallary)[, ">50k"]))
[1] "24"
also works for the example dataset.
data
df <-
structure(list(Age = c(24L, 17L, 31L, 24L), Sallary = structure(c(2L,
1L, 2L, 2L), .Label = c("<=50k", ">50k"), class = "factor")), .Names = c("Age",
"Sallary"), class = "data.frame", row.names = c(NA, -4L))