0

I would like to assign an identity to each data point based on the factor "Name" of data points and where the factor is the same it must have the same identity no or ID tag. I have a large amount of data so this can be a random identity code - it just needs to group those with the same name under an individual id, this way i can make the names anonymous but still keep the data points grouped together.

e.g. in the dummy data below "Aur" could be A, "Cos" = B ... next ,C, D.... A1, B1, ...A2.... etc.

I think it would be some group_by(Name, mutate()) function? But I am not sure.

Here is some dummy data:

df <- structure(list(`Local Time` = structure(c(1559388960, 
1559389200, 1559394840, 1559397180, 1559397900, 1559398380, 
1559398560, 1559398680, 1559398740, 1559398800, 1559399160, 
1559399280, 1559399400, 1559399580, 1559399640, 1559399820, 
1559399940, 1559400120, 1559400240, 1559400780, 1559400840, 
1559400960, 1559401080, 1559401260, 1559401380, 1559383560, 
1559389200, 1559389440, 1559395080, 1559395320, 1559397180, 
1559397900, 1559398200, 1559398440, 1559398680, 1559398920, 
1559399220, 1559399520, 1559399820, 1559400120, 1559400360, 
1559400660, 1559400960, 1559401200, 1559401500, 1559401740, 
1559402040, 1559402280, 1559402580, 1559402880
), class = c("POSIXct", "POSIXt"), tzone = ""), COG = c(315, 
352.6, 265.6, 214.9, 240.8, 245.5, 240.3, 250.5, 262.4, 269.8, 
281.1, 262.9, 253.1, 247.7, 255.5, 249.4, 263.2, 268.6, 279.6, 
274.3, 254.6, 246.6, 253.7, 242.3, 163.5, 90, 88, 89, 93, 96, 
95, 97, 97, 98, 98, 95, 93, 94, 92, 91, 91, 91, 91, 90, 90, 92, 
89, 89, 89, 88), NAME = c("Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", "Aur", 
"Aur", "Aur", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", 
"Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", 
"Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos", "Cos"
 )), row.names = c(NA, -50L), class = c("tbl_df", "tbl", 
"data.frame"))
Lmm
  • 403
  • 1
  • 6
  • 24
  • Would just making the variables the numeric representation of a `factor` be sufficient - `as.integer(factor(df$NAME))` ? – thelatemail Jul 31 '19 at 00:20

3 Answers3

2

You can use dplyr::group_indices().

library(dplyr)

df <- df %>%
  mutate(id = group_indices(., NAME))
neilfws
  • 32,751
  • 5
  • 50
  • 63
1

Can the IDs be numbers? It should work as well.

unique_name <- unique(df$NAME) 

id_mapping <- 1:length(unique_name) %>%
    setNames(unique_name)

df %>%
    mutate(id = id_mapping[NAME])

# A tibble: 50 x 4
   `Local Time`          COG NAME     id
   <dttm>              <dbl> <chr> <int>
 1 2019-06-01 04:36:00  315  Aur       1
 2 2019-06-01 04:40:00  353. Aur       1
 3 2019-06-01 06:14:00  266. Aur       1
 4 2019-06-01 06:53:00  215. Aur       1
 5 2019-06-01 07:05:00  241. Aur       1
 6 2019-06-01 07:13:00  246. Aur       1
 7 2019-06-01 07:16:00  240. Aur       1
 8 2019-06-01 07:18:00  250. Aur       1
 9 2019-06-01 07:19:00  262. Aur       1
10 2019-06-01 07:20:00  270. Aur       1
# ... with 40 more rows
yusuzech
  • 5,896
  • 1
  • 18
  • 33
0

An option with data.table would be .GRP

library(data.table)
setDT(df)[, id := .GRP,.(NAME)][]
akrun
  • 874,273
  • 37
  • 540
  • 662