I have a dataframe that contains 3 columns. One of the columns (items) includes long strings (some include special characters such as semicolon ; for simplicity it is written here as Tx) that might repeat. I would like to find all the unique values of the specific column. For each unique value to create a nickname in the following format: OV1, OV2,... OVn. I would like to keep this table and to call it 'transformation table'. In the next step I would like to go to the source table and to recode each value with the value in transformation table.
Here is an example for the source table:
items sequenceID eventID SIZE
1: T1 41595370 1 1
2: T2 41595371 1 1
3: T3 41595282 1 1
4: T3 41595282 2 1
5: T4 41595373 1 1
6: T5 41595368 1 1
7: T1 41595379 1 1
The result table would be:
items sequenceID eventID SIZE
1: OV1 41595370 1 1
2: OV2 41595371 1 1
3: OV3 41595282 1 1
4: OV3 41595282 2 1
5: OV4 41595373 1 1
6: OV5 41595368 1 1
7: OV1 41595379 1 1
The transformation table would be:
1: T1 OV1
2: T2 OV2
3: T3 OV3
4: T4 OV4
5: T5 OV5
The source data is in data6
(dataframe).
I used the following orders:
u1 <- unique(data6$items)
u1 <- data.frame(u1)
ov <- c(paste("ov",1:nrow(u1),sep=""))
I would now to replace all the unique items column that resides in u1
with OV
values and to create this as transformation table. Afterwards to replace all the values in data6$items
.
Thanks a lot for your help.