0

I am facing a weird behaviour in R when trying to apply a map to a dataframe.

I have a dataframe named data that has a column "month" with the string name of the months such as "jan", "feb", ..., "dec".

I would like to convert these strings to the corresponding month number, so for example "jun" becomes 6 as June is the 6th month of the year.

Following the advice of this post, I wrote the following mapping:

months = 1:12
names(months) = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")

Here's the first few entries of data before the mapping:

> data$month[1:20]
 [1] mar oct oct mar mar aug aug aug sep sep sep sep aug sep sep sep mar oct mar apr
Levels: apr aug dec feb jan jul jun mar may nov oct sep

However, when I apply the map operation to data, something seems to go wrong:

> months[data$month[1:20]]
aug nov nov aug aug feb feb feb dec dec dec dec feb dec dec dec aug nov aug jan 
  8  11  11   8   8   2   2   2  12  12  12  12   2  12  12  12   8  11   8   1 

What I expected to obtain was something that started with 3 10 10 3 and not 8 11 11 8, since March is the 3rd month and October is the 10th month.

Am I missing something?

Thanks in advance for any help! :D

aurorca
  • 31
  • 4
  • 2
    Use `match`. Here is an example : `set.seed(42); x <- sample(month.abb); x; match(x, month.abb)`. Note that there is a built-in constants called `month.abb`. Also `data$month` is a factor and `mar` is the 8th level, `oct` the 11th. Make sure to use `stringAsFactors = FALSE` when you create your dataframe. – markus Jan 04 '20 at 18:56
  • Possible dupe of [Is there an R function for finding the index of an element in a vector?](https://stackoverflow.com/questions/5577727/is-there-an-r-function-for-finding-the-index-of-an-element-in-a-vector) – markus Jan 04 '20 at 19:02
  • markus, using `stringAsFactors = FALSE` in `read.delim()` when creating the dataframe worked, thanks a lot! – aurorca Jan 04 '20 at 19:17

2 Answers2

0

The issue in the example happens because of months is in the format of factor with levels ordered according alphabetically, you can avoid that by converting it into character as follows;

# Creating the dataframe
data <-
  data.frame(
    month = c("mar" , "oct" , "oct" , "mar" , "mar" , "aug" , "aug" , 
              "aug" , "sep" , "sep" , "sep" , "sep" , "aug" , "sep" , 
              "sep" , "sep" , "mar" , "oct" , "mar" , "apr"),
    stringAsFactors = TRUE # Because from the example it is apparent it is factor
  )

# Creating frame of month number
months = 1:12
names(months) = c("jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec")

months[as.character(data$month[1:20])] # Getting month number after converstion to character

# mar oct oct mar mar aug aug aug sep sep sep sep aug sep sep sep mar oct mar apr 
# 3  10  10   3   3   8   8   8   9   9   9   9   8   9   9   9   3  10   3   4 

A simpler way is to use match() function which automatically takes month name and get its number without the need of creating a vector as follows;

# Creating the dataframe
data <-
  data.frame(
    month = c("mar" , "oct" , "oct" , "mar" , "mar" , "aug" , "aug" , 
              "aug" , "sep" , "sep" , "sep" , "sep" , "aug" , "sep" , 
              "sep" , "sep" , "mar" , "oct" , "mar" , "apr"),
    stringAsFactors = TRUE # Because from the example it is apparent it is factor
  )

# str_to_title is used to convert first character to upper case mar -> Mar
# Then match is used to get month number from its name
match(stringr::str_to_title(data$month), month.abb)

# mar oct oct mar mar aug aug aug sep sep sep sep aug sep sep sep mar oct mar apr 
# 3  10  10   3   3   8   8   8   9   9   9   9   8   9   9   9   3  10   3   4 
Nareman Darwish
  • 1,251
  • 7
  • 14
  • Nice answer. Please note that your second option using `match()` doesn't return the month names as part of the output array. It just returns the month numbers. In order to get the named array output you show, we should use the output of match() as indices of the months array (i.e. `months[ match(...) ]`) – mastropi Jan 05 '20 at 18:49
0

You don't need to define months. There is a built-in month.abb which allows you to do the whole thing with this one-liner, whether or not you forget stringsAsFactors = F:

as.numeric(factor(as.character(data$month), levels = tolower(month.abb)))
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87