2

This should be easy, but I can't find a straight forward way to achieve it. My dataset looks like the following:

                DisplayName Nationality Gender Startyear
1           Alfred H. Barr, Jr.    American   Male      1929
2               Paul C\216zanne      French   Male      1929
3                  Paul Gauguin      French   Male      1929
4              Vincent van Gogh       Dutch   Male      1929
5         Georges-Pierre Seurat      French   Male      1929
6            Charles Burchfield    American   Male      1929
7                Charles Demuth    American   Male      1929
8             Preston Dickinson    American   Male      1929
9              Lyonel Feininger    American   Male      1929
10 George Overbury ("Pop") Hart    American   Male      1929
...

I want to group by DisplayName and Gender, and get the counts for for each of the names (they are repeated several times on the list, with different year information).

The following 2 commands give me the same output, but they are not sorted by the count output "n". Any ideas on how to achieve this?

artists <- data %>%
  filter(!is.na(Gender) & Gender != "NULL") %>%
  group_by(DisplayName, Gender) %>%
  tally(sort = T) %>%
  arrange(desc(n))


artists <- data %>%
  filter(!is.na(Gender) & Gender != "NULL") %>%
  count(DisplayName, Gender, sort = T)


                 DisplayName Gender     n
                       (chr)  (chr) (int)
1              A. F. Sherman   Male     1
2             A. G. Fronzoni   Male     2
3         A. Lawrence Kocher   Male     3
4            A. M. Cassandre   Male    21
5             A. R. De Ycaza Female     1
6  A.R. Penck (Ralf Winkler)   Male    20
7              Aaron Siskind   Male    25
8         Abigail Perlmutter Female     1
9            Abraham Rattner   Male     5
10         Abraham Walkowitz   Male    17
..                       ...    ...   ...
masta-g3
  • 1,202
  • 4
  • 17
  • 27

1 Answers1

8

Your data is grouped by two variables. So after tally, your dataframe is still grouped by Display name. So arrange(desc(n)) is sorting but by Disply name. If you want to sort the all dataframe by column n, just ungroup before sorting. try this :

artists <- data %>%
  filter(!is.na(Gender) & Gender != "NULL") %>%
  group_by(DisplayName, Gender) %>%
  tally(sort = T) %>%
  ungroup() %>%
  arrange(desc(n))
cderv
  • 6,272
  • 1
  • 21
  • 31