0

This is my df.

index   firmcode    year    indcode     ROA
  0      a         2006      03         0.1
  1      b         2006      03         0.2
  2      c         2006      03         0.4
  3      d         2006      03         0.7   
  4      e         2006      07         0.3
  5      f         2006      07         0.8
  6      g         2006      07         1.1
  7      h         2006      07         2.1

I would like it to be as follows. This is ROA matching the nearest company(within the same year and same indcode, except for the same company)

index   firmcode    year    indcode     ROA   diff_min_firmcode
  0      a         2006      03         0.1         b  
  1      b         2006      03         0.2         a
  2      c         2006      03         0.4         b          
  3      d         2006      03         0.7         c
  4      e         2006      07         0.3         f 
  5      f         2006      07         0.8         g 
  6      g         2006      07         1.1         f
  7      h         2006      07         2.1         g 

How can I get the df['diff_min_firmcode'] column?

Wimpel
  • 26,031
  • 1
  • 20
  • 37

3 Answers3

2

Here is another approach, using full_join with the same data.frame. After filtering out rows to avoid matching with its own ROA, you can determine the minimum difference between ROA and others within the same year and indcode group.

library(dplyr)

full_join(df, df, by = c("year", "indcode")) %>%
  filter(index.x != index.y) %>%
  group_by(firmcode.x, year, indcode) %>%
  mutate(diff_min_firmcode = firmcode.y[which.min(abs(ROA.x - ROA.y))]) %>%
  distinct(firmcode.x, .keep_all = TRUE)

Output

  index.x firmcode.x  year indcode ROA.x index.y firmcode.y ROA.y diff_min_firmcode
    <int> <chr>      <int>   <int> <dbl>   <int> <chr>      <dbl> <chr>            
1       0 a           2006       3   0.1       1 b            0.2 b                
2       1 b           2006       3   0.2       0 a            0.1 a                
3       2 c           2006       3   0.4       0 a            0.1 b                
4       3 d           2006       3   0.7       0 a            0.1 c                
5       4 e           2006       7   0.3       5 f            0.8 f                
6       5 f           2006       7   0.8       4 e            0.3 g                
7       6 g           2006       7   1.1       4 e            0.3 f                
8       7 h           2006       7   2.1       4 e            0.3 g
Ben
  • 28,684
  • 5
  • 23
  • 45
1

What happens when there is a tie ?

You can try this answer -

fun <- function(a, b) {
  apply(abs(outer(a, a, `-`)), 1, function(x) b[x == min(x[x != 0])[1]])  
}

If you have multiple years and want to match only within each specific years you can do -

library(dplyr)

df %>%
  group_by(year, indcode) %>%
  mutate(diff_min_firmcode = fun(ROA, firmcode)) %>%
  ungroup

#  index firmcode  year indcode   ROA diff_min_firmcode
#  <int> <chr>    <int>   <int> <dbl> <chr>            
#1     0 a         2006       3   0.1 b                
#2     1 b         2006       3   0.2 a                
#3     2 c         2006       3   0.4 b                
#4     3 d         2006       3   0.7 c                
#5     4 e         2006       7   0.3 f                
#6     5 f         2006       7   0.8 g                
#7     6 g         2006       7   1.1 f                
#8     7 h         2006       7   2.1 g                           
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

Here is another approach you could use for this question:

library(dplyr)
library(tidyr)
library(purrr)

df %>%
  group_by(year, indcode) %>%
  mutate(output = map(ROA, ~ abs(ROA - .x))) %>%
  unnest_wider(col = output) %>%
  rowwise() %>%
  mutate(inds = which(c_across(contains("...")) == 
                        min(c_across(contains("..."))[c_across(contains("...")) != 0])[1])) %>%
  select(!contains("...")) %>%
  group_by(year, indcode) %>%
  mutate(diff_min = map_chr(inds, ~ firmcode[.x]))

# A tibble: 8 x 7
# Groups:   year, indcode [2]
  index firmcode  year indcode   ROA  inds diff_min
  <int> <chr>    <int>   <int> <dbl> <int> <chr>   
1     0 a         2006       3   0.1     2 b       
2     1 b         2006       3   0.2     1 a       
3     2 c         2006       3   0.4     2 b       
4     3 d         2006       3   0.7     3 c       
5     4 e         2006       7   0.3     2 f       
6     5 f         2006       7   0.8     3 g       
7     6 g         2006       7   1.1     2 f       
8     7 h         2006       7   2.1     3 g 
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41