I am trying to get the ECDF for all items similar (in the whole data table) to the item number in each row, and add the ECDF column to the end of the data table (EstimatePrediction).
This works for individual items, so they can be checked one by one.
#Set Current ItemNumber
currentItemNumber = “XXXXX”
#Set Estimate Days
currentEstimate = 5
#Gets the index of the ItemNumber from the Matches table
itemNoIndex = ((matches%>%subset(Item_No ==itemNumber))$ItemIndex[1])
#Gets all the matching indexs that equal the index and select data
matchingItems = matches%>%filter(ItemIndex == itemNoIndex) %>%
filter(MatchItemIndex != ItemIndex) %>%
merge(data.filter %>%
select(ITEM_NO,ACTUAL_DAYS),by = 'ITEM_NO')
#Get the ECDF of all matching items at the estimate
ecdf(matchingItems $ACTUAL_DAYS)( currentEstimate )
I am trying to take the above R code and modify to work for the whole data.filter data table. The problem is it only works for the first row in data.filter data. The rows after the first are based off the first row’s data, not their own.
EstimatePrediction = data.filter %>% mutate(PROBABILITY_PREDICTION = ecdf((matches%>%subset(ItemIndex == ((matches%>%subset(Item_No== ITEM_NO))$ItemIndex[1])) %>%
subset(MatchItemIndex != ItemIndex) %>%
merge(data.filter, by = 'ITEM_NO'))$ACTUAL_DAYS)(ESTIMATE_DAYS) )
I am very new to R so I am open to any suggestions. I can get the correct output by iterating through the data.filter, but it is extremely slow.
Sample Data
Matches
MatchItemIndex ItemIndex MatchItemOrder Item_No Count Cumulative
<int> <int> <int> <chr> <int> <int>
1 1 1 1 CBL233J 14 14
2 2 2 1 CGW112N 4 4
3 3 3 1 CAT418D 5 5
4 4 4 1 BRH131T 29 29
5 5 5 1 CQD390A 17 17
6 6 6 1 CEE533J 11 11
data.filter
ITEM_NO ESTIMATE_DAYS ACTUAL_DAYS
1: CBL233J 10 6
2: CGW112N 22 12
3: CAT418D 22 18
4: BRH131T 33 16
5: CQD390A 21 15
6: CEE533J 7 2
EDIT**** I am now able to get the output I need its just really slow:
data.filter = data.filter%>%mutate(Index = 1:n())
loopData = data.filter%>%select(ITEM_NO, ACTUAL_DAYS, ESTIMATE_DAYS, Index)
simpleV = unlist(loopData)
outputTest = 1:nrow(loopData)
ptm <- proc.time()
for(i in 1:nrow(loopData)){
#Get Index for Item Number
itemNoIndex = (matches%>%subset(ITEM_NO == simpleV[paste('ITEM_NO',i,sep="")]))$ItemIndex[1]
#Find all the matches that have the same index
allNNItemData = matches%>%subset(ItemIndex == itemNoIndex) %>%
subset(MatchItemIndex != ItemIndex) %>%
merge(data.filter, by = 'ITEM_NO')
outputTest[i] = ecdf(allNNItemData$ACTUAL_DAYS)(simpleV[paste('ESTIMATE_DAYS',i,sep="")])
}
proc.time() - ptm