Creating a month id column

Question

I have a data frame that currently looks like this:

 Month      Park             
   <date>     <chr>            
  2019-04-01 Arbour Lake East   
  2019-07-01 Arbour Lake East             
  2019-07-01 Arbour Lake East                      
  2019-09-01 Arbour Lake East                         
  2019-09-01 Arbour Lake East                       
  2019-10-01 Arbour Lake East                       
  2020-01-01 Arbour Lake East                        
  2020-01-01 Arbour Lake East                       
  2020-02-01 Arbour Lake East                       
  2020-02-01 Arbour Lake East                    
  2020-03-01 Arbour Lake East              
  2020-04-01 Arbour Lake East                 
  2020-05-01 Arbour Lake East            
  2020-11-01 Arbour Lake East        
  2020-12-01 Arbour Lake East                      
  2021-04-01 Arbour Lake East               
  2019-09-01 Arbour Lake West                
  2019-09-01 Arbour Lake West             
  2019-10-01 Arbour Lake West                
  2020-05-01 Arbour Lake West

I would like to create a new column, Month id, where 1 is the first month found in a specific park, and 2 is the second month in that same park (independently on whether or not these months are actually consecutive. For example, 1 could be September because it's the first month in Nosehill gravel pit Park, and 2 would be November because it's the second month in Nosehill gravel pit Park). Some id (1, 2, 3, ...) would be the same within different parks as they only represent the first month in a park. Months that are exactly the same (month/year) within the same park would also receive the same id.

Here is what I would like that column to look like:

 Month    Month_id  Park                        
  2019-04-01  01 Arbour Lake East   
  2019-07-01  02 Arbour Lake East            
  2019-07-01  02 Arbour Lake East                      
  2019-09-01  03 Arbour Lake East                         
  2019-09-01  03 Arbour Lake East                       
  2019-10-01  04 Arbour Lake East                       
  2020-01-01  05 Arbour Lake East                        
  2020-01-01  05 Arbour Lake East                       
  2020-02-01  06 Arbour Lake East                       
  2020-02-01  06 Arbour Lake East                    
  2020-03-01  07 Arbour Lake East              
  2020-04-01  08 Arbour Lake East                 
  2020-05-01  09 Arbour Lake East            
  2020-11-01  10 Arbour Lake East         
  2020-12-01  11 Arbour Lake East                      
  2021-04-01  12 Arbour Lake East               
  2019-09-01  01 Arbour Lake West                
  2019-09-01  01 Arbour Lake West             
  2019-10-01  02 Arbour Lake West                
  2020-05-01  03 Arbour Lake West

I really don't know how to do this, so any lead would be really appreciated!

More info:

> dput(Data.frame[1:4])
structure(list(Month = structure(c(18383, 18383, 18414, 18414, 
18444, 18718, 18322, 18687, 18687, 18293, 18293, 18383, 18444, 
18475, 18506, 18536, 18567, 18567, 18628, 18748, 18748, 18779, 
18809, 18078, 18078, 18109, 18109, 18628, 18628, 18444, 18444, 
18475), class = "Date"), Park = c("Aspen Heights", "Aspen Heights", 
"Aspen Heights", "Aspen Heights", "Aspen Heights", "Aspen Heights", 
"Auburn Bay", "Auburn Bay", "Auburn Bay", "Bayview", "Bayview", 
"Bayview", "Bayview", "Bayview", "Bayview", "Bayview", "Bayview", 
"Bayview", "Bayview", "Bayview", "Bayview", "Bayview", "Bayview", 
"Cranston", "Cranston", "Cranston", "Cranston", "Cranston", "Cranston", 
"Currie Barracks", "Currie Barracks", "Currie Barracks"), Aggr_Code = c("1", 
"2", "1", "2", "1", "1", "1", "1", "2", "1", "2", "1", "1", "1", 
"1", "1", "1", "2", "1", "1", "2", "1", "1", "1", "2", "1", "2", 
"1", "2", "1", "2", "1"), AC_events_per_month = c(4, 1, 4, 1, 
2, 1, 1, 2, 1, 1, 1, 1, 3, 2, 4, 2, 6, 2, 3, 1, 1, 1, 1, 8, 4, 
2, 1, 3, 3, 2, 1, 1)), row.names = c(NA, -32L), groups = structure(list(
    Month = structure(c(18078, 18109, 18293, 18322, 18383, 18383, 
    18414, 18444, 18444, 18444, 18475, 18475, 18506, 18536, 18567, 
    18628, 18628, 18687, 18718, 18748, 18779, 18809), class = "Date"), 
    Park = c("Cranston", "Cranston", "Bayview", "Auburn Bay", 
    "Aspen Heights", "Bayview", "Aspen Heights", "Aspen Heights", 
    "Bayview", "Currie Barracks", "Bayview", "Currie Barracks", 
    "Bayview", "Bayview", "Bayview", "Bayview", "Cranston", "Auburn Bay", 
    "Aspen Heights", "Bayview", "Bayview", "Bayview"), .rows = structure(list(
        24:25, 26:27, 10:11, 7L, 1:2, 12L, 3:4, 5L, 13L, 30:31, 
        14L, 32L, 15L, 16L, 17:18, 19L, 28:29, 8:9, 6L, 20:21, 
        22L, 23L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -22L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

Please provide a [reproducible minimal example](https://stackoverflow.com/q/5963269/8107362). Especially, provide some ready-to-use sample data, e.g. with `dput()` and use the [reprex-package](https://reprex.tidyverse.org/). — mnist, Sep 13 '21 at 16:09

score 0 · Answer 1 · answered Sep 13 '21 at 16:42

Here is a solution using the datastep() function from the libr package.

First, create your sample data:

# Create data
df <- read.table(header = TRUE, text = '
 Month      Park             
  2019-04-01 "Arbour Lake East"   
  2019-07-01 "Arbour Lake East"             
  2019-07-01 "Arbour Lake East"                      
  2019-09-01 "Arbour Lake East"                         
  2019-09-01 "Arbour Lake East"                       
  2019-10-01 "Arbour Lake East"                       
  2020-01-01 "Arbour Lake East"                        
  2020-01-01 "Arbour Lake East"                       
  2020-02-01 "Arbour Lake East"                       
  2020-02-01 "Arbour Lake East"                    
  2020-03-01 "Arbour Lake East"              
  2020-04-01 "Arbour Lake East"                 
  2020-05-01 "Arbour Lake East"            
  2020-11-01 "Arbour Lake East"        
  2020-12-01 "Arbour Lake East"                      
  2021-04-01 "Arbour Lake East"               
  2019-09-01 "Arbour Lake West"                
  2019-09-01 "Arbour Lake West"             
  2019-10-01 "Arbour Lake West"                
  2020-05-01 "Arbour Lake West"')
 
df$Month <- as.Date(df$Month)

Second, generate the ID column. The datastep will loop through the dataframe row by row. The by parameter sets a by group on Month and Park. Then you can look at the changing Park value to reset the ID for each Park using the data[n. -1, "Park"] construct.

library(libr)

# Perform datastep to calculate id
df2 <- datastep(df, by = c("Month", "Park"),
                retain = list(Month_id = 0),
                keep = c("Month", "Month_id", "Park"),
                {
                  if (n. > 1) {
                    if (Park != data[n. - 1, "Park"])
                      Month_id <- 0
                  }
                  
                  if (first.) {
                  
                    Month_id <- Month_id + 1
                    
                  }
                })

# Add leading zero to id
df2$Month_id <- sprintf("%02d", df2$Month_id)

Here are the results:

df2
#         Month Month_id             Park
# 1  2019-04-01       01 Arbour Lake East
# 2  2019-07-01       02 Arbour Lake East
# 3  2019-07-01       02 Arbour Lake East
# 4  2019-09-01       03 Arbour Lake East
# 5  2019-09-01       03 Arbour Lake East
# 6  2019-10-01       04 Arbour Lake East
# 7  2020-01-01       05 Arbour Lake East
# 8  2020-01-01       05 Arbour Lake East
# 9  2020-02-01       06 Arbour Lake East
# 10 2020-02-01       06 Arbour Lake East
# 11 2020-03-01       07 Arbour Lake East
# 12 2020-04-01       08 Arbour Lake East
# 13 2020-05-01       09 Arbour Lake East
# 14 2020-11-01       10 Arbour Lake East
# 15 2020-12-01       11 Arbour Lake East
# 16 2021-04-01       12 Arbour Lake East
# 17 2019-09-01       01 Arbour Lake West
# 18 2019-09-01       01 Arbour Lake West
# 19 2019-10-01       02 Arbour Lake West
# 20 2020-05-01       03 Arbour Lake West

You could also do this with dplyr. But I'll leave that for someone else to answer.

score 0 · Accepted Answer · answered Sep 13 '21 at 17:39

0

library(tidyverse)
df %>%
  group_by(Park) %>%
  mutate(ID = sprintf("%02d",as.integer(factor(Month))))

In Base R you will do:

transform(df, ID = ave(as.character(Month), Park,FUN = ordered))

answered Sep 13 '21 at 17:39

Onyambu

67,392
3
24
53

The tidyverse option did exactly what I wanted it to, thank you so much! – Gab_Laj Sep 13 '21 at 20:48

Creating a month id column

2 Answers2