I have a dataframe with a few character columns and a date column and a string column.
One of the columns is a list of cities and I'd like to know which cities show up the most in my dataset. I used table(dataframe$city)
, but it gave me a list of every city (including cities that show up just once or twice).
How do I filter the results of my city to show just the cities in the top quartile, based on the number of times they appear in the data?
here's example input :
id price city
1 $0.8 los angeles
2 $0.8 new york
3 $0.5 new york
4 $0.6 new york
5 $0.9 los angeles
6 $0.1 houston
7 $0.7 chicago
8 $0.8 new york
9 $0.7 new york
10 $0.0 new york
11 $0.5 new york
12 $0.1 new york
13 $0.9 new york
14 $0.3 los angeles
15 $0.9 los angeles
16 $0.9 los angeles
17 $0.8 los angeles
18 $0.5 miami
19 $0.9 boston
20 $1.0 newton
21 $0.2 san mateo
22 $0.3 milbrae
When I do table(dataframe$city)
, I get a list of every city and a count of how many times it appears. What if I just want a list of the cities that appear more than average (like new york and los angeles)?