Aside from misspelling 'Smoking Status'
as 'Smoking status'
, you ran into two other problems.
Variable Names vs. Strings
We use single ('
) or double quotes ("
) to designate strings: 'my string'
or "my string"
. However, to designate (unusual) variable names (symbols) with spaces in them, we use backticks (`
): `my variable`
. Since it's a pain to type those backticks, we typically use underscores (_
) rather than spaces in variable names.
When (re)naming columns, character
strings are as good as symbols. That is
# ... %>%
dplyr::rename('Smoking Status' = smoking_status) # %>% ...
# |--------------|
# character string
is equivalent to
# ... %>%
dplyr::rename(`Smoking Status` = smoking_status) # %>% ...
# |--------------|
# symbol
However, when performing vectorized operations with mutate()
or filter()
or arrange()
, any string will be treated as simply a scalar character
value. That is
# ... %>%
mutate(test = 'Smoking Status') # %>% ...
# |--------------|
# character string
will not copy the `Smoking Status`
column (a factor
)
# A tibble: 5 x 6
... test
... <fct>
1 ... Ex smoker
2 ... Current smoker
3 ... Never smoked
4 ... Unknown
5 ... Non smoker - smoking history unknown
but rather give you a (character
) column filled with the literal string 'Smoking Status'
:
# A tibble: 5 x 6
... test
... <chr>
1 ... Smoking Status
2 ... Smoking Status
3 ... Smoking Status
4 ... Smoking Status
5 ... Smoking Status
Similarly, your
# ... %>%
dplyr::arrange('Smoking Status')
# |----|
# Corrected typo: 'status'.
does not sort on the `Smoking Status`
column, but rather on a (temporary) column filled with the string 'Smoking Status'
. Since everything in that column is the same, no rearranging occurs at all, and the smoking_gender_disch_piv_count
dataset remains unchanged.
Fix
To fix this particular issue, use:
# ... %>%
dplyr::arrange(`Smoking Status`)
Strings vs. Factors
Even after fixing the issue above, you'll still have a problem. Your Smoking Status
column is a factor
[1] Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
Levels: Ex smoker Current smoker Never smoked Unknown Non smoker - smoking history unknown
so when you sort on this column, it follow the ordering of the factor
levels, which are visibly not in alphabetical order.
Fix
To sort by alphabetical order, use the character
form of the `Smoking Status`
column:
# ... %>%
dplyr::arrange(as.character(`Smoking Status`))
Solution
Given the smoking_gender_disch_piv_count
dataset you reproduced
smoking_gender_disch_piv_count <-
structure(list(smoking_status = structure(1:5, .Label = c("Ex smoker", "Current smoker", "Never smoked", "Unknown", "Non smoker - smoking history unknown"), class = "factor"),
Female = c(24.0601503759398, 9.02255639097744, 35.3383458646617, 6.01503759398496, 25.5639097744361),
Male = c(34.9753694581281, 13.7931034482759, 23.6453201970443, 1.97044334975369, 25.615763546798),
NSTEMI = c(31.9078947368421, 12.5, 28.2894736842105, 3.28947368421053, 24.0131578947368),
STEMI = c(18.75, 6.25, 28.125, 6.25, 40.625)),
row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"))
the following dplyr
workflow
smoking_gender_disch_piv_count_ren <- smoking_gender_disch_piv_count %>%
dplyr::rename(`Smoking Status` = smoking_status) %>%
dplyr::arrange(as.character(`Smoking Status`))
will give you your desired results for smoking_gender_disch_piv_count_ren
# A tibble: 5 x 5
`Smoking Status` Female Male NSTEMI STEMI
<fct> <dbl> <dbl> <dbl> <dbl>
1 Current smoker 9.02 13.8 12.5 6.25
2 Ex smoker 24.1 35.0 31.9 18.8
3 Never smoked 35.3 23.6 28.3 28.1
4 Non smoker - smoking history unknown 25.6 25.6 24.0 40.6
5 Unknown 6.02 1.97 3.29 6.25
while still preserving the factor
information in `Smoking Status`
.