R adding distinct id column from multiple columns

Question

I currently have a dataframe (df) that looks like:

> j
  policyNumber driverName vehicleName
1            1      jason        blue
2            1       josh         red
3            1      jason       green
4            2      jason      orange
5            2       kyle      orange
6            3      chris        pink
7            3       ally      purple

I would like to add distinct driver and vehicle ID's that depend on the policyNumber and driverName or vehicleName, respectively. My biggest issue is that I can not get a function that recognizes that the policy number has changed (so to reset the sequence to 1), and that the same entries may not be consecutive (like 'jason' in policyNumber 1).

I would like to have a dataframe that comes out to:

> j
  policyNumber driverName vehicleName driverNumber vehicleNumber
1            1      jason        blue            1             1
2            1       josh         red            2             2
3            1      jason       green            1             3
4            2      jason      orange            1             1
5            2       kyle      orange            2             1
6            3      chris        pink            1             1
7            3       ally      purple            2             2

I think you have to update `driverNumber` and `vehicleNumber` in a loop where you go through unique values of j: `unique(j$policyNumber)`. — Heikki, Oct 27 '17 at 18:45
@MikeH. The post you provided does not question how to create indices from multiple columns. It also does not address the problem of resetting the sequential order. — jadki, Oct 27 '17 at 20:36
From my understanding you want one index from one column, you just want it two times. The answer in the linked post is almost exactly the same as the one accepted below. The only missing part is the grouping which there are plenty of other answers on how to do by group operations in R — Mike H., Oct 27 '17 at 21:00

BENY · Accepted Answer · 2017-10-27T18:55:40.483

In Base R

dt$driverNumber = ave(dt$driverName,dt$policyNumber,FUN = function(x) as.numeric(as.factor(x)))
dt$vehicleNumber = ave(dt$vehicleName,dt$policyNumber,FUN = function(x) as.numeric(as.factor(x)))
dt
  policyNumber driverName vehicleName driverNumber vehicleNumber
1            1      jason        blue            1             1
2            1       josh         red            2             3
3            1      jason       green            1             2
4            2      jason      orange            1             1
5            2       kyle      orange            2             1
6            3      chris        pink            2             1
7            3       ally      purple            1             2

score 1 · Answer 2 · answered Oct 27 '17 at 18:49

Same as Wen's, but with dplyr. I also specify the levels to keep the order as how they appear in the data instead of alphabetical order.

library(dplyr)
j %>% group_by(policyNumber) %>%
  mutate(driverNumber = as.numeric(factor(driverName, levels = unique(driverName))), 
         vehicleNumber = as.numeric(factor(vehicleName, levels = unique(vehicleName))))

# # A tibble: 7 x 5
# # Groups:   policyNumber [3]
#   policyNumber driverName vehicleName driverNumber vehicleNumber
#          <int>     <fctr>      <fctr>        <dbl>         <dbl>
# 1            1      jason        blue            1             1
# 2            1       josh         red            2             2
# 3            1      jason       green            1             3
# 4            2      jason      orange            1             1
# 5            2       kyle      orange            2             1
# 6            3      chris        pink            1             1
# 7            3       ally      purple            2             2

R adding distinct id column from multiple columns

2 Answers2