0

Im new to this place and I'm not super experienced with R but I need it at work and I really hope you can support me

So i have a huge data set but i will explain the issue using small sample

I have already grouped my data set to achieve a layout which i want

enter image description here

So basically i have multiple EXCPosOutlet and EXCPPMonth names and i need to remove lowest values per EXCPosOutlet per EXCMonth which sum up to 10% of total for that individual group.

So lets say that total of AvaragePrice for a sampleName for Month 612 is 1000$. i need to remove all rows with lowest values of AveragePrice which sum up to 100$

If removing is messy, even creating extra column (mutate) using ifelse for example which would just tell me if it falls under my criteria, that would be totally enough

I have tried all ntile, quntile fucntions but im not geeting what i need.

Thank you so much in advance LEt me know if I should provide more details

Martin Gal
  • 16,640
  • 5
  • 21
  • 39
  • Please make a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) or [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) with a sample input (not posted in an image) and your expected output. This is needed to create, test and verify possible solutions. – Martin Gal Mar 28 '22 at 16:08

1 Answers1

0

One possibility is to use the dplyr package and, for legibility, the pipe operator %>%. There's other ways towards the same result, but you might want to give it a try:

library(dplyr)

## generate example data:
data.frame(
    EXCPosOutlet = gl(3,12),
    AveragePrice = runif(36) * 100
) %>% 
    ## sort dataframe by outlet and (increasing) price:
    arrange(EXCPosOutlet, AveragePrice) %>%
    ## group by outlet:
    group_by(EXCPosOutlet) %>%
    ## calculate cumulative price:
    mutate(cumAveragePrice = cumsum(AveragePrice)) %>%
    ## keep rows which, per outlet, total less than the treshold of $100:
    filter(cumAveragePrice <= 100)

  • Thanks a lot for your feedback. at the beggining i thought it will also be more straight forward and i also used dplyr with many combinations but i failed everywhere. Using your example, the sum of avarage price for excoutlet 1 = approx 542. what i need to remove/select/higlight (whatever) are lowest values which will sum up to 10% of this value which in this case is approx 54. so the values which i need to get back are 2.22, 8.62, 11.26 and 22.82. i hope there i a way :) – Pawel Stasiuk Mar 25 '22 at 20:41
  • Mind that my made-up example does *not* include the month as a grouping variable. Did you make sure to extend the grouping to group_by(EXCPosOutlet, EXCMonth)? –  Mar 25 '22 at 20:56
  • yes i did but im afriad the outcome is different to want i need to achieve :( – Pawel Stasiuk Mar 25 '22 at 21:07
  • Can you paste a sufficient portion of your data here, using dput(your_data)? Sufficient meaning that the total in at least one outlet-month-group exceeds the treshold (so that the more expensive goods have to be dropped)? –  Mar 25 '22 at 21:14
  • Yes, I will do it tomorrow, its a sensitive data so i will need to change few things. Thank you for your help – Pawel Stasiuk Mar 27 '22 at 14:37
  • 1
    Thanks one more time for your help, you were right, cumsum was the right thing to do, i just tried millions times different otpions and i got it :) so to get lowest 10% i did something like that and im using group 1 to identify them: check <- data%>% group_by(EXCPosOutlet,EXCPPMonth)%>%arrange(AveragePrice)%>% mutate(new <- as.numeric(cut(cumsum(AveragePrice), breaks=10))) – Pawel Stasiuk Mar 28 '22 at 11:43