I'm fairly new to posting topics on stack overflow. Usually I can find what ever I need, but in this case I can't seem to find the proper solution. I'm pretty sure it shouldn't be too hard to deal with.
I have large sets of data where I'd like to perform some functions. I can make it work in loops but since I'm trying to improve the efficiency and speed of the script I'm trying a different approach.
Therefor I'd like to use something like dplyr to help me going.
Let's dive into it.
In the table below I have rearranged a small part of the data. Where column key_1 is my key combing invoice, date and amount. Row 1 and 2 are identical which is great but sadly row 3 is not recognized as being identical. Although it should. I used to loop through the invoicnumber and match it on partial stringmatching using grepl. Sadly this is quite an intensive job when you have over a milion rows. The idea is to mutate the column saldi so it sums up the column amount properly meaning the new value would be 1000.
ID | key_1 | original_invoice_number | invoice_number | amount | saldi |
---|---|---|---|---|---|
1 | R176000002022-06-021000 | RLZ17600000 | RLZ17600000 | -1000 | -1000 |
2 | R176000002022-06-021000 | RLZ17600000 | RLZ17600000 | 1000 | 2000 |
3 | R17600000a12022-06-021000 | RLZ17600000a1 | RLZ17600000a1 | 1000 | 2000 |
4 | TEST10012022-06-022000 | TEST1001 | TEST1001 | 5000 | 5000 |
5 | TEST10012022-06-022000 | TEST1001a | TEST1001 | 5000 | 5000 |
6 | TEST10012022-06-022000 | TEST1001b | TEST1001 | -5000 | 5000 |
I made a function that does exactly what I wanted to do, when I put in my exact parameters. But when I try to call on this function using dplyr and getting the parameters after using group_by in dplyr I can't seem to pass my parameters properly.
my_function<- function(data, group_invoicenumber, group_amount){
my_regex <- paste0("\\b",{{group_invoicenumber}})
temp_df <- data %>%
filter(str_detect(invoice_number, regex(my_regex, ignore_case = TRUE)) & amount == {{group_amount}}) %>%
mutate(saldi = sum(amount))
checked_df <- rbind(checked_df, temp_df)
}
checked_df <- data[0,]
data %>%
group_by(key_1)%>%
my_function(group_invoicenumber= invoice_number,
group_amount= amount)
Every time I run this the objects invoice_number and amount can't be found. Does anyone know how to get the parameters properly through.
Besides that I'm quite new to coding so there might be a much easier solution to get this fixed.
Thanks in advance anywyas!
#In addition to my first post here is how I'd like to see my desired output.
ID | key_1 | original_invoice_number | invoice_number | amount | saldi | relation |
---|---|---|---|---|---|---|
1 | R176000002022-06-021000 | RLZ17600000 | RLZ17600000 | -1000 | 1000 | RLZ17600000 |
2 | R176000002022-06-021000 | RLZ17600000 | RLZ17600000 | 1000 | 1000 | RLZ17600000 |
3 | R17600000a12022-06-021000 | RLZ17600000a1 | RLZ17600000a1 | 1000 | 1000 | RLZ17600000 |
4 | TEST10012022-06-022000 | TEST1001 | TEST1001 | 5000 | 5000 | TEST1001 |
5 | TEST10012022-06-022000 | TEST1001a | TEST1001 | 5000 | 5000 | TEST1001 |
6 | TEST10012022-06-022000 | TEST1001b | TEST1001 | -5000 | 5000 | TEST1001 |