0

I am trying to arrange 'Smoking status' categories in alphabetical order.This shoudl be only with tidyverse.

This is what I have tried

smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
       dplyr::rename('Smoking Status' = smoking_status) %>%
       dplyr::arrange('Smoking status')
     smoking_gender_disch_piv_count_ren

As one can see, I do not get Current smoker first, and then ex smoker, etc. I thought arrange function in dplyr will do the trick. But it does not.

This is the data I have:

structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", 
"Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"
), class = "factor"), Female = c(24.0601503759398, 9.02255639097744, 
35.3383458646617, 6.01503759398496, 25.5639097744361), Male = c(34.9753694581281, 
13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798
), NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 
24.0131578947368), STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625
)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
GaB
  • 1,076
  • 2
  • 16
  • 29
  • Have you tried re-ordering the factor levels, or coercing to character? – Bill O'Brien Jul 22 '21 at 16:53
  • Re-ordering? How ? – GaB Jul 22 '21 at 16:53
  • 2
    In `arrange`, use backticks around Smoking Status rather than single quotes. On a standard American keyboard, the backtick is the key in the upper left corner. With single quotes, R treats it as a character string, rather than as a column name in your data. – eipi10 Jul 22 '21 at 16:54
  • @eipi10 did try both - "Smoking status" and `Smoking Status` in arrange function and does not work – GaB Jul 22 '21 at 17:05
  • 1
    @GaB Please use `dput()` to provide a reproducible version of your `smoking_gender_disch_piv_count` dataset. If I had to guess, your problem is that `\`Smoking Status\`` is a `factor` (not a `character` string), where `"Current smoker"` comes before `"Ex smoker"`. Maybe try `dplyr::arrange(as.character(\`Smoking Status\`))`? – Greg Jul 22 '21 at 17:12
  • @Greg, you're right! Did it. Tried passing as.character and it did not work. – GaB Jul 22 '21 at 17:21
  • @Greg, funny enough, I passed dplyr::arrange(as.character(smokig_status)) and it worked! Oddly enough the white spaces. Probably this is what eipi10 referred to? – GaB Jul 22 '21 at 17:25
  • @Greg please post it as an asnwer and then explain the reason? Weird!! – GaB Jul 22 '21 at 17:25
  • @GaB You misspelled `\`Smoking Status\`` as `\`Smoking status\`` – Greg Jul 22 '21 at 17:26
  • Just try arrange(`Smoking Status`) and it should work the problem in your example is that you had status not Status (S capital). – Nareman Darwish Jul 22 '21 at 17:26
  • I have actually corrected it, and it does not solve the issue. I believe the "Smoking status" doesn not do the trick, rather "smoking_status". I could be the white space in between? Still someone shoudl post as an answer ? – GaB Jul 22 '21 at 17:32

1 Answers1

2

Aside from misspelling 'Smoking Status' as 'Smoking status', you ran into two other problems.

Variable Names vs. Strings

We use single (') or double quotes (") to designate strings: 'my string' or "my string". However, to designate (unusual) variable names (symbols) with spaces in them, we use backticks (`): `my variable`. Since it's a pain to type those backticks, we typically use underscores (_) rather than spaces in variable names.

When (re)naming columns, character strings are as good as symbols. That is

  # ... %>%
  dplyr::rename('Smoking Status' = smoking_status) # %>% ...
  #             |--------------|
  #             character string

is equivalent to

  # ... %>%
  dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
  #             |--------------|
  #                  symbol

However, when performing vectorized operations with mutate() or filter() or arrange(), any string will be treated as simply a scalar character value. That is

  # ... %>%
  mutate(test = 'Smoking Status') # %>% ...
  #             |--------------|
  #             character string

will not copy the `Smoking Status` column (a factor)

# A tibble: 5 x 6
  ... test                                
  ... <fct>                               
1 ... Ex smoker                           
2 ... Current smoker                      
3 ... Never smoked                        
4 ... Unknown                             
5 ... Non smoker - smoking history unknown

but rather give you a (character) column filled with the literal string 'Smoking Status':

# A tibble: 5 x 6
  ... test          
  ... <chr>         
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status

Similarly, your

  # ... %>%
  dplyr::arrange('Smoking Status')
  #                       |----|
  #      Corrected typo: 'status'.

does not sort on the `Smoking Status` column, but rather on a (temporary) column filled with the string 'Smoking Status'. Since everything in that column is the same, no rearranging occurs at all, and the smoking_gender_disch_piv_count dataset remains unchanged.

Fix

To fix this particular issue, use:

  # ... %>%
  dplyr::arrange(`Smoking Status`)

Strings vs. Factors

Even after fixing the issue above, you'll still have a problem. Your Smoking Status column is a factor

[1] Ex smoker                            Current smoker                       Never smoked                         Unknown                              Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown

so when you sort on this column, it follow the ordering of the factor levels, which are visibly not in alphabetical order.

Fix

To sort by alphabetical order, use the character form of the `Smoking Status` column:

  # ... %>%
  dplyr::arrange(as.character(`Smoking Status`))

Solution

Given the smoking_gender_disch_piv_count dataset you reproduced

smoking_gender_disch_piv_count <-
  structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
                 Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
                 Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
                 NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
                 STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
            row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))

the following dplyr workflow

smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
  dplyr::rename(`Smoking Status` = smoking_status) %>%
  dplyr::arrange(as.character(`Smoking Status`))

will give you your desired results for smoking_gender_disch_piv_count_ren

# A tibble: 5 x 5
  `Smoking Status`                     Female  Male NSTEMI STEMI
  <fct>                                 <dbl> <dbl>  <dbl> <dbl>
1 Current smoker                         9.02 13.8   12.5   6.25
2 Ex smoker                             24.1  35.0   31.9  18.8 
3 Never smoked                          35.3  23.6   28.3  28.1 
4 Non smoker - smoking history unknown  25.6  25.6   24.0  40.6 
5 Unknown                                6.02  1.97   3.29  6.25

while still preserving the factor information in `Smoking Status`.

Greg
  • 3,054
  • 6
  • 27