Find minimum of a column with condition on another, for every observation

Question

I have a DataFrame consisting of almost 200,000 Observations. The Important columns are similar to (NOTE: The Min Value is not filled by default)

Key        Value      Min Value
1xA         1           1
2xA         2           2
3xB         3           2 
1xB         1           1
1xA         5           1 
2xB         2           2
3xB         2           2
2xA         4           2

How can achieve this to fill the Min Value column in R? I tried looping in the dataframe and finding minimum of a subset created for key while checking that particular observation, and it worked fine. But it takes too much time to carry out the 200,000 Observation iteration.

My Thoughts: I was thinking if there was any way to filter out using which() for every observation in one go so that I can find the min. then and there?

score 0 · Accepted Answer · answered Jun 01 '18 at 13:13

0

Using dplyr, this is not too complicated:

df
#    Key Value
# 1 1xA     1
# 2 2xA     2
# 3 3xB     3
# 4 1xB     1
# 5 1xA     5
# 6 2xB     2
# 7 3xB     2
# 8 2xA     4

df %>% 
    dplyr::group_by(Key) %>% 
    dplyr::mutate(Min.Value = min(Value))
# A tibble: 8 x 3
# Groups:   Key [5]
#   Key   Value Min.Value
#   <fct> <int>     <dbl>
# 1 1xA       1         1
# 2 2xA       2         2
# 3 3xB       3         2
# 4 1xB       1         1
# 5 1xA       5         1
# 6 2xB       2         2
# 7 3xB       2         2
# 8 2xA       4         2

answered Jun 01 '18 at 13:13

C. Braun

5,061
19
47

This works like charm. – d1r3w0lF Jun 01 '18 at 19:22
Is it possible to find any additional column(s) corresponding to the minimum value, and fill it like a a reference in all the grouped observation? – d1r3w0lF Jun 02 '18 at 06:06

score 0 · Answer 2 · answered Jun 01 '18 at 13:23

With basic functions only:

merge(df, aggregate(list(Min.Value=df$Value), by=list(key=df$Key), FUN=min), by="Key")

The aggregate function takes the Value column (the first parameter), groups it by the values supplied with the by parameter (must be a list of vectors of the same length as the fist parameter), and applies the FUN function to each group (here min). The first parameter is wrapped to a list just to rename the resulting column (if you replace it with just df$Value, the resulting column will be named x.)

The outer merge joins rows of the two data frames supplied that have the same value in the Keycolumn (i.e. much like database join operation).

The basic problem with using merge is that it follows the iterative way, hence quite slow. 200,000+ data set will take me ages. — d1r3w0lF, Jun 01 '18 at 19:09

Find minimum of a column with condition on another, for every observation

2 Answers2

Linked