I'm very new to R and struggling with using it for basic data analysis.
If I load a table, how can I find the Top 10 values for every column, along with each value's frequency & count of appearance? In addition, I'd like to also find out the frequency of blanks.
Using "Forbes2000" from the "HSAUR" package...
data("Forbes2000", package = "HSAUR")
head(Forbes2000)
The data contains 8 columns, some of which ("rank", "name", "sales", etc.) is unique per row. However, some columns ("country", "category") are not unique.
So, for each column, I'd like to find out the top 10 unique values, their % frequency, and counts. In addition, if the column contains at least one blank/NULL, an additional row showing the same info. If each row is unique, limit the results to 10 rows.
So, something like... (numbers below made up)
country percentage rank
United States 85.35% 1
United Kingdom 6.31% 2
Canada 3.12% 3
category percentage rank
Banking 55.28% 1
Conglomerates 20.75% 2
Insurance 12.23% 3
NULL 3.32% 4
Oil & gas operations 2.11% 5
...(etc)...
sales percentage rank
1234.56 0.05% 1
987.65 0.05% 1
986.32 0.05% 1
822.12 0.05% 1
...(etc)...
I've looked around StackOverflow for a while and found a few ranking questions, they they were 2D in nature ( How to return 5 topmost values from vector in R? ), or for a single column (how to find the top N values by group or within category (groupwise) in an R data.frame ). I'm looking for a solution that is 3D in nature, as appending
names(Forbes2000)
doesn't seem to work to loop through all the columns.