R Mutate and Subset Data on two data tables

Question

I am trying to get the ECDF for all items similar (in the whole data table) to the item number in each row, and add the ECDF column to the end of the data table (EstimatePrediction).

This works for individual items, so they can be checked one by one.

    #Set Current ItemNumber
    currentItemNumber = “XXXXX”
     #Set Estimate Days
    currentEstimate = 5
    #Gets the index of the ItemNumber from the Matches table
    itemNoIndex = ((matches%>%subset(Item_No ==itemNumber))$ItemIndex[1])
    #Gets all the matching indexs that equal the index and select data
    matchingItems = matches%>%filter(ItemIndex == itemNoIndex) %>%
                             filter(MatchItemIndex != ItemIndex) %>%
                             merge(data.filter %>%
                             select(ITEM_NO,ACTUAL_DAYS),by = 'ITEM_NO')
    #Get the ECDF of all matching items at the estimate
    ecdf(matchingItems $ACTUAL_DAYS)( currentEstimate )

I am trying to take the above R code and modify to work for the whole data.filter data table. The problem is it only works for the first row in data.filter data. The rows after the first are based off the first row’s data, not their own.

EstimatePrediction = data.filter %>% mutate(PROBABILITY_PREDICTION = ecdf((matches%>%subset(ItemIndex == ((matches%>%subset(Item_No== ITEM_NO))$ItemIndex[1])) %>%
subset(MatchItemIndex != ItemIndex) %>%
merge(data.filter, by = 'ITEM_NO'))$ACTUAL_DAYS)(ESTIMATE_DAYS) )

I am very new to R so I am open to any suggestions. I can get the correct output by iterating through the data.filter, but it is extremely slow.

Sample Data

    Matches

 MatchItemIndex ItemIndex MatchItemOrder  Item_No Count Cumulative
           <int>     <int>          <int>   <chr> <int>      <int>
1              1         1              1 CBL233J    14         14
2              2         2              1 CGW112N     4          4
3              3         3              1 CAT418D     5          5
4              4         4              1 BRH131T    29         29
5              5         5              1 CQD390A    17         17
6              6         6              1 CEE533J    11         11

    data.filter

   ITEM_NO ESTIMATE_DAYS ACTUAL_DAYS
1: CBL233J            10           6
2: CGW112N            22          12
3: CAT418D            22          18
4: BRH131T            33          16
5: CQD390A            21          15
6: CEE533J             7           2

EDIT**** I am now able to get the output I need its just really slow:

data.filter = data.filter%>%mutate(Index = 1:n())
loopData = data.filter%>%select(ITEM_NO, ACTUAL_DAYS, ESTIMATE_DAYS, Index)
simpleV = unlist(loopData)
outputTest = 1:nrow(loopData)
ptm <- proc.time()
for(i in 1:nrow(loopData)){

  #Get Index for Item Number
  itemNoIndex = (matches%>%subset(ITEM_NO == simpleV[paste('ITEM_NO',i,sep="")]))$ItemIndex[1]
  #Find all the matches that have the same index 
  allNNItemData = matches%>%subset(ItemIndex == itemNoIndex) %>%
    subset(MatchItemIndex != ItemIndex) %>%
    merge(data.filter, by = 'ITEM_NO')

  outputTest[i] = ecdf(allNNItemData$ACTUAL_DAYS)(simpleV[paste('ESTIMATE_DAYS',i,sep="")])
} 
proc.time() - ptm

Welcome to SO! Could you please post a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example)? — csgroen, Aug 23 '17 at 18:36
Thanks for the reply! I am trying to get the ECDF for all items similar (in the whole data table) to the item number in each row, and add the ECDF column to the end of the data table (EstimatePrediction). The code above should work with the example data set added above. — JJansen27, Aug 23 '17 at 19:00
Please also explain in words what your code does. Seems like maybe you just need to join your tables, but hard to tell without explanation. It would also be really nice if you provided your sample data in a copy/pasteable way - either share code to create the sample or use `dput()` on your sample data to generate such code. — Gregor Thomas, Aug 23 '17 at 19:00
Matches table has an index for each item number and has a matching index for all items that are similar to the item. I would like to get all the similair items and get the ECDF of the actual days — JJansen27, Aug 23 '17 at 19:02

score 0 · Answer 1 · answered Aug 23 '17 at 19:46

0

See if this solves it:

library(tidyverse)

#-- Declare objects
Matches <- tibble(MatchItemIndex = 1:6, ItemIndex = 1:6,
                  MatchItemOrder = rep(1,6), Item_No = c("CBL233J", "CGW112N",
                                                         "CAT418D", "BRH131T",
                                                         "CQD390A", "CEE533J"), 
                  Count = c(14,4,5,29,17,11), Cumulative = c(14,4,5,29,17,11))

data.filter <- tibble(ITEM_NO = c("CBL233J", "CGW112N",
                                  "CAT418D", "BRH131T",
                                  "CQD390A", "CEE533J"),
                      ESTIMATE_DAYS = c(10, 22, 22, 33, 21, 7),
                      ACTUAL_DAYS = c(6, 12, 18, 16, 15, 2))

#-- Get matching items by item no
matchingItems <- intersect(Matches$Item_No, data.filter$ITEM_NO)

#-- Filter data.filter to matching items
df <- filter(data.filter, ITEM_NO == matchingItems)

#-- Do analysis
ecdf(df$ACTUAL_DAYS)(currentEstimate)

answered Aug 23 '17 at 19:46

csgroen

2,511
11
28

Thank you csgroen, but this still does not solve the problem. Basically I would like to be able to loop through each row get all the items that are similar to the item in the row, and get the ecdf for that row based on the Actual days and estimate days and save the ecdf for that row. – JJansen27 Aug 23 '17 at 19:57
I'm still trying to understand what exactly you need. Do you want to compare estimate days to actual days on each matching item? – csgroen Aug 23 '17 at 20:00
Sorry csgroen, I am having a hard time explaining it. Some of the items do not have enough data to create a ecdf. I am using KNN to find items that are similar to each other. Each row needs to have its own ecdf output based on all the similar items actual days and the row's(or current item's) estimate days. – JJansen27 Aug 23 '17 at 20:04
I see. So the matches table is for 1 item? – csgroen Aug 23 '17 at 20:17
The matches table holds all Match data. There are 50 matches for each item and there are like 200K+ records in the data.filter. – JJansen27 Aug 23 '17 at 20:44

R Mutate and Subset Data on two data tables

1 Answers1