I have the following dataframe in R
library(dplyr)
library(tidyr)### IMPORTED LIBRARIES
DF<-data.frame("Index"=c(1,2,3,4,5,6,7,8,9,10))
DF$CI=c("A1", "A2", "A3", "A4", 'A1', "A6", "A7", "A8", "A9", "A9")
The dataframe consists of two columns, an Index column with serial numbers from one to 10 and a second column CI. the values in CI columns may be unique or duplicated. The dataframe appears as follows
Index CI
1 1 A1
2 2 A2
3 3 A3
4 4 A4
5 5 A1
6 6 A6
7 7 A7
8 8 A8
9 9 A9
10 10 A9
I would like to add a column EI that indicates the correct Index value in case a value in CI is duplicated. The expected output is as follows
Index CI EI
1 1 A1 1
2 2 A2 2
3 3 A3 3
4 4 A4 4
5 5 A1 1
6 6 A6 6
7 7 A7 7
8 8 A8 8
9 9 A9 9
10 10 A9 9
The column EI should show the corresponding index value that corresponds to the entry in CI column. In case of a duplication, the value returned should be the minimal index value. I have tried the following code using DPLYR
DF%>%mutate(EI=case_when(CI==unique(CI)~min(Index)))
I am getting the following output.
Index CI EI
1 1 A1 1
2 2 A2 1
3 3 A3 1
4 4 A4 1
5 5 A1 NA
6 6 A6 NA
7 7 A7 NA
8 8 A8 NA
9 9 A9 NA
10 10 A9 NA
I have tried this code as well but am not getting the desired output
DF%>%mutate(EI=min(Index))
Output
Index CI EI
1 1 A1 1
2 2 A2 1
3 3 A3 1
4 4 A4 1
5 5 A1 1
6 6 A6 1
7 7 A7 1
8 8 A8 1
9 9 A9 1
10 10 A9 1
I request a little help here. Am using DPLYR as the dataset is large( i have given a toy dataset) and a loop is taking very long.