take unique count and sum each unique values in R

Question

Case1: Input

ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
7/27/16 7/27/16 265       O          15          1      INTEREST        855          
7/27/16 7/27/16 265       O          14          1      INTEREST 855

Expected output

ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_INTEREST     L_NU   
7/27/16 7/27/16 265      O           29          1             2             855

Case2: Input (my code is working for case2 but throwing error for case1)

ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
7/27/16 7/27/16 265   O          15       1     INTEREST        855          
7/27/16 7/27/16 265   O          14       1     INSTALLMENT   855    

ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   INTEREST INSTALLMENT     L_NU   
7/27/16 7/27/16 265      O           29           1      1          1           855


install_cntdup_less1 <- install_BAN %>% 
   group_by(AB_NO,LO_NO,L_NU)%>% 
   mutate(ACTV_AMT = sum(ACTV_AMT),ftr=sum(unique(!is.na(FEATURE_CODE))))%>%  
   spread(FEATURE_CODE,ftr,fill = 0)%>%
   slice(which.min(as.Date(ST_DATE, '%Y/%m/%d')))%>% 
   slice(which.max(as.Date(ND_DATE, '%Y/%m/%d')))

Getting the below error

Error: Duplicate identifiers for rows (29424, 29425, 29426), (7415, 7416), (30120, 30121)

tried introducing unique ID as mentioned below link but it is messing my output Spread with duplicate identifiers (using tidyverse and %>%)

mutate(ind = row_number()) %>%

I am unable to understand what to do can anyone help me in resolving the error. It may seem duplicate question but it is not

In your output you have 8 column names and 9 values. Is FEATURE_INTEREST 2 or 1? — AntoniosK, Nov 05 '17 at 11:28
Doesn't seem like a difficult process, but you need to provide some more info about what's the philosophy of your process. What will be the unique identifier in your new dataset? Why is the FEATURE_INTEREST 2 when you have 1 distinct value in FEATURE_CODE? Maybe provide some more data so we can see different values for different combinations? — AntoniosK, Nov 05 '17 at 12:41
@AntoniosK what if I had 2 different feature_code can that be converted as features. This is the earlier question I asked https://stackoverflow.com/questions/47115780/spread-and-merge-row-records-in-r-for-the-same-customer — vinay karagod, Nov 05 '17 at 12:57

AntoniosK · Accepted Answer · 2017-11-05T14:57:34.370

1

This will help:

library(dplyr)
library(tidyr)

# example data
dt = read.table(text = "
                ST_DATE ND_DATE LO_NO   ACTV_CODE   ACTV_AMT    AB_NO   FEATURE_CODE    L_NU    
                7/27/16 7/27/16 265       O          15          1      INTEREST        855          
                7/27/16 7/27/16 265       OO          14          1      INTEREST        855
                7/27/16 7/27/16 265       O          15          1      OTHER        855          
                7/27/16 7/27/16 265       OO          14          1      OTHER        855 
                ", header=T, stringsAsFactors = F)

dt %>%
  group_by(AB_NO,LO_NO,L_NU)%>% 
  mutate(ACTV_AMT = sum(ACTV_AMT),
         ST_DATE = min(ST_DATE),
         ND_DATE = max(ND_DATE)) %>%
  ungroup() %>%
  mutate(id = row_number(),
         FEATURE_CODE = paste0("FEATURE_", FEATURE_CODE),
         ACTV_CODE = paste0("ACTV_", ACTV_CODE),
         count_FEATURE = 1,
         count_ACTV = 1) %>%
  spread(FEATURE_CODE, count_FEATURE) %>%
  spread(ACTV_CODE, count_ACTV) %>%
  select(-id) %>%
  group_by(ST_DATE, ND_DATE, LO_NO, ACTV_AMT, AB_NO, L_NU) %>%
  summarise_all(sum, na.rm=T) %>%
  ungroup()

# # A tibble: 1 x 10
#     ST_DATE ND_DATE LO_NO ACTV_AMT AB_NO  L_NU FEATURE_INTEREST FEATURE_OTHER ACTV_O ACTV_OO
#       <chr>   <chr> <int>    <int> <int> <int>            <dbl>         <dbl>  <dbl>   <dbl>
#   1 7/27/16 7/27/16   265       58     1   855                2             2      2       2

edited Nov 05 '17 at 14:57

answered Nov 05 '17 at 12:49

AntoniosK

15,991
2
19
32

I want to generalize the solution. There can be different feature codes – vinay karagod Nov 05 '17 at 13:18
1

thanks for the answer I am trying to spread ACTV_CODE as well but I am getting error Can you just help me on adding multiple spread function – vinay karagod Nov 05 '17 at 14:28
So, you want to treat ACTV_CODE exactly like FEATURE_CODE, right? – AntoniosK Nov 05 '17 at 14:32
1

Just updated my answer, but had to add some more rows in order to be able to see that it works. – AntoniosK Nov 05 '17 at 15:00
Thank you so much for your patience – vinay karagod Nov 05 '17 at 15:06
Keep in mind that `spread` takes only one column as input at a time, so you have to follow the same process for any column you want to spread. – AntoniosK Nov 05 '17 at 15:07
I need one more help from you. Instead of just taking the count of the feature code I want to replace by the corresponding ACTV_AMT under that feature code is that possible? – vinay karagod Nov 10 '17 at 02:03
I need an example to understand how you want the output to look like. – AntoniosK Nov 10 '17 at 08:29

take unique count and sum each unique values in R

1 Answers1

Linked