Add minimum value as a new column using mutate

Question

I have the following dataframe in R

library(dplyr)
library(tidyr)### IMPORTED LIBRARIES
DF<-data.frame("Index"=c(1,2,3,4,5,6,7,8,9,10))
DF$CI=c("A1", "A2", "A3", "A4", 'A1', "A6", "A7", "A8", "A9", "A9")

The dataframe consists of two columns, an Index column with serial numbers from one to 10 and a second column CI. the values in CI columns may be unique or duplicated. The dataframe appears as follows

   Index CI
1      1 A1
2      2 A2
3      3 A3
4      4 A4
5      5 A1
6      6 A6
7      7 A7
8      8 A8
9      9 A9
10    10 A9

I would like to add a column EI that indicates the correct Index value in case a value in CI is duplicated. The expected output is as follows

    Index CI EI
 1      1 A1  1
 2      2 A2  2
 3      3 A3  3
 4      4 A4  4
 5      5 A1  1
 6      6 A6  6
 7      7 A7  7
 8      8 A8  8
 9      9 A9  9
 10    10 A9  9

The column EI should show the corresponding index value that corresponds to the entry in CI column. In case of a duplication, the value returned should be the minimal index value. I have tried the following code using DPLYR

  DF%>%mutate(EI=case_when(CI==unique(CI)~min(Index)))

I am getting the following output.

    Index CI EI
 1      1 A1  1
 2      2 A2  1
 3      3 A3  1
 4      4 A4  1
 5      5 A1 NA
 6      6 A6 NA
 7      7 A7 NA
 8      8 A8 NA
 9      9 A9 NA
 10    10 A9 NA

I have tried this code as well but am not getting the desired output

    DF%>%mutate(EI=min(Index))

Output

      Index CI EI
   1      1 A1  1
   2      2 A2  1
   3      3 A3  1
   4      4 A4  1
   5      5 A1  1
   6      6 A6  1
   7      7 A7  1
   8      8 A8  1
   9      9 A9  1
   10    10 A9  1

I request a little help here. Am using DPLYR as the dataset is large( i have given a toy dataset) and a loop is taking very long.

`DF %>% mutate(EI = group_indices(., CI))` Or Perhaps you are looking for `DF %>% group_by(CI) %>% mutate(EI = min(Index)) ` — Ronak Shah, Aug 02 '19 at 06:09
@RonakShah Also `EI = first(Index)`, instead of `min(Index)`. — Rui Barradas, Aug 02 '19 at 06:16
@ Rui Barradas how does first (Index) differ from min(Index) — Thilaga raghavan, Aug 02 '19 at 06:17

score 2 · Accepted Answer · answered Aug 02 '19 at 06:13

2

You want the minimum value of Index, within matching values of CI. That's what group_by() is for:

DF %>%
    group_by(CI) %>%
    mutate(EI = min(Index))

answered Aug 02 '19 at 06:13

Marius

58,213
16
107
105

Alright. That explains it. Once grouped, the CI==CI is redundant. Thank – Thilaga raghavan Aug 02 '19 at 06:16

Add minimum value as a new column using mutate

1 Answers1