Assign identity code based on factor name

Question

I would like to assign an identity to each data point based on the factor "Name" of data points and where the factor is the same it must have the same identity no or ID tag. I have a large amount of data so this can be a random identity code - it just needs to group those with the same name under an individual id, this way i can make the names anonymous but still keep the data points grouped together.

e.g. in the dummy data below "Aur" could be A, "Cos" = B ... next ,C, D.... A1, B1, ...A2.... etc.

I think it would be some group_by(Name, mutate()) function? But I am not sure.

Here is some dummy data:

df <- structure(list(`Local Time` = structure(c(1559388960, 
1559389200, 1559394840, 1559397180, 1559397900, 1559398380, 
1559398560, 1559398680, 1559398740, 1559398800, 1559399160, 
1559399280, 1559399400, 1559399580, 1559399640, 1559399820, 
1559399940, 1559400120, 1559400240, 1559400780, 1559400840, 
1559400960, 1559401080, 1559401260, 1559401380, 1559383560, 
1559389200, 1559389440, 1559395080, 1559395320, 1559397180, 
1559397900, 1559398200, 1559398440, 1559398680, 1559398920, 
1559399220, 1559399520, 1559399820, 1559400120, 1559400360, 
1559400660, 1559400960, 1559401200, 1559401500, 1559401740, 
1559402040, 1559402280, 1559402580, 1559402880
), class = c("POSIXct", "POSIXt"), tzone = ""), COG = c(315, 
352.6, 265.6, 214.9, 240.8, 245.5, 240.3, 250.5, 262.4, 269.8, 
281.1, 262.9, 253.1, 247.7, 255.5, 249.4, 263.2, 268.6, 279.6, 
274.3, 254.6, 246.6, 253.7, 242.3, 163.5, 90, 88, 89, 93, 96, 
95, 97, 97, 98, 98, 95, 93, 94, 92, 91, 91, 91, 91, 90, 90, 92, 
89, 89, 89, 88), NAME = c("Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", 
"Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", 
"Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos"
 )), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"))

Would just making the variables the numeric representation of a `factor` be sufficient - `as.integer(factor(df$NAME))` ? — thelatemail, Jul 31 '19 at 00:20

score 2 · Accepted Answer · answered Jul 30 '19 at 22:47

2

You can use dplyr::group_indices().

library(dplyr)

df <- df %>%
  mutate(id = group_indices(., NAME))

answered Jul 30 '19 at 22:47

neilfws

32,751
5
50
63

score 1 · Answer 2 · answered Jul 30 '19 at 22:43

Can the IDs be numbers? It should work as well.

unique_name <- unique(df$NAME) 

id_mapping <- 1:length(unique_name) %>%
    setNames(unique_name)

df %>%
    mutate(id = id_mapping[NAME])

# A tibble: 50 x 4
   `Local Time`          COG NAME     id
   <dttm>              <dbl> <chr> <int>
 1 2019-06-01 04:36:00  315  Aur       1
 2 2019-06-01 04:40:00  353. Aur       1
 3 2019-06-01 06:14:00  266. Aur       1
 4 2019-06-01 06:53:00  215. Aur       1
 5 2019-06-01 07:05:00  241. Aur       1
 6 2019-06-01 07:13:00  246. Aur       1
 7 2019-06-01 07:16:00  240. Aur       1
 8 2019-06-01 07:18:00  250. Aur       1
 9 2019-06-01 07:19:00  262. Aur       1
10 2019-06-01 07:20:00  270. Aur       1
# ... with 40 more rows

score 0 · Answer 3 · answered Jul 31 '19 at 00:44

0

An option with data.table would be .GRP

library(data.table)
setDT(df)[, id := .GRP,.(NAME)][]

answered Jul 31 '19 at 00:44

akrun

874,273
37
540
662

Assign identity code based on factor name

3 Answers3