0

Im getting NA value when im trying to replace month number with month name with the below code:

total_trips_v2$month <- ordered(total_trips_v2$month, levels=c("Jul","Aug","Sep","Oct", "Nov","Dec","Jan", "Feb", "Mar","Apr","May","Jun"))

Im working with a big data set where the month column was char data type and the months were numbered as '06','07' and so on starting with 06.

Im not quiet sure even the ordered function in the code which i used, what it really does.I saw it somewhere and i used it. I tried to look up codes to replace specific values in rows but it looked very confusing. Can anyone help me out with this?

Sarthak Dev
  • 13
  • 1
  • 7
  • 1
    Can you edit the question with the output of `dput(head(total_trips_v2$month, 20))`? – Rui Barradas Aug 08 '21 at 15:46
  • @ruibarradas can you tell me what exactly that will do? im just seeing "07" printed 20 times. – Sarthak Dev Aug 08 '21 at 17:16
  • That means that the first 20 months are all `"07"`. Anyway, you already have an answer that works, so it's no longer important. See my comment to the answer, please. – Rui Barradas Aug 08 '21 at 17:25

2 Answers2

4

Working with data types can be confusing at times, but it helps you with what you want to achieve. Thus, make sure you understand how to move from type to type!

There are some "helpers" build in to R to work with months and months' names.

Below we have a "character" vector in our data frame, i.e. df$month.
The helper vectors in R are month.name (full month names) and month.abb (abbreviated month names).

You can index a vector by calling the element of the vector at the n-th position. Thus, month.abb[6] will return "Jun". We use this to coerce the month to "numeric" and then recode it with the abbreviated names.

# simulating some data
df <- data.frame(month = c("06","06","07","09","01","02"))

# test index month name
month.abb[6]

# check what happens to our column vector - for this we coerce the 06,07, etc. to numbers!
month.abb[as.numeric(df$month)]

# now assign the result
df$month_abb <- month.abb[as.numeric(df$month)]

This yields:

df
  month month_abb
1    06       Jun
2    06       Jun
3    07       Jul
4    09       Sep
5    01       Jan
6    02       Feb
Ray
  • 2,008
  • 14
  • 21
1

The lubridate package can also help you extract certain components of datetime objects, such as month number or name.

Here, I have made some sample dates:

tibble(
  date = c('2021-01-01', '2021-02-01', '2021-03-01')
) %>% 
  {. ->> my_dates}

my_dates

# # A tibble: 3 x 1
# date      
# <chr>     
# 2021-01-01
# 2021-02-01
# 2021-03-01

First thing we need to do it convert these character-formatted values to date-formatted values. We use lubridate::ymd() to do this:

my_dates %>% 
  mutate(
    date = ymd(date)
    ) %>% 
  {. ->> my_dates_formatted}

my_dates_formatted

# # A tibble: 3 x 1
# date      
# <date>    
# 2021-01-01
# 2021-02-01
# 2021-03-01

Note that the format printed under the column name (date) has changed from <chr> to <date>.

Now that the dates are in <date> format, we can pull out different components using lubridate::month(). See ?month for more details.

my_dates_formatted %>% 
  mutate(
    month_num = month(date), 
    month_name_abb = month(date, label = TRUE), 
    month_name_full = month(date, label = TRUE, abbr = FALSE)
  )

# # A tibble: 3 x 4
# date       month_num month_name_abb month_name_full
# <date>         <dbl> <ord>          <ord>          
# 2021-01-01         1 Jan            January        
# 2021-02-01         2 Feb            February       
# 2021-03-01         3 Mar            March 

See my answer to your other question here, but when working with dates in R, it is good to leave them in the default YYYY-MM-DD format. This generally makes calculations and manipulations more straightforward. The month names as shown above can be good for making labels, for example when making figures and labelling data points or axes.

hugh-allan
  • 1,170
  • 5
  • 16