Sample data:
Bilagstoptekst <- c("A", "A", "A", "A", "A","B","B","C","C","C","C","C","C","C")
AKT <- c("80","80","80","25","25","25","25","80","80","80","80","80","80","80")
IA <- c("HUVE", "HUVE", "HUBO", "BILÅ", "BILÅ", "BILÅ","BILÅ", "HUBO","HUBO","HUBO","HUBO","HUBO","HUBO","HUBO")
Belob <- c(100,100,50,75,40,60,400,100,100,100,100,100,333,333)
FPT8 <- data.frame(Bilagstoptekst, AKT, IA, Belob)
> FPT8
Bilagstoptekst AKT IA Belob
A 80 HUVE 100
A 80 HUVE 100
A 80 HUBO 50
A 25 BILÅ 75
A 25 BILÅ 40
B 25 BILÅ 60
B 25 BILÅ 400
C 80 HUBO 100
C 80 HUBO 100
C 80 HUBO 100
C 80 HUBO 100
C 80 HUBO 100
C 80 HUBO 333
C 80 HUBO 333
Bilagstoptekst <- c("A", "A", "A", "A", "B", "C", "C")
AKT <- c("80", "80", "25", "25", "25", "80", "80")
IA <- c("HUVE", "HUBO", "BILÅ", "BILÅ", "BILÅ", "HUBO", "HUBO")
RegKonto <- c(4,5,7,1,6,3,9)
Psteksnr <- c(1,6,8,2,5,7,9)
Belob_sum <- c(200,50,75,40,460,500,666)
G69 <- data.frame(Bilagstoptekst, AKT, IA, RegKonto, Psteksnr, Belob_sum)
> G69
Bilagstoptekst AKT IA RegKonto Psteksnr Belob_sum
A 80 HUVE 4 1 200
A 80 HUBO 5 6 50
A 25 BILÅ 7 8 75
A 25 BILÅ 1 2 40
B 25 BILÅ 6 5 460
C 80 HUBO 3 7 500
C 80 HUBO 9 9 666
Now, my real dataset is very large.
What I want to do is to merge the RegKonto and Psteksnr from G69 to FPT8.
I have three key columns, that should match each other in the two dataframes:
Bilagstoptekst, AKT, IA.
But I can't just to a left_join using those since there's another rule. The FPT8$Belob should match the G69$Belob_sum. And sometimes it does match (fx in my example data row 3). Sometimes I can find the match by adding all the FPT8$Belob together and match that number (combined with my 3 key columns) with the G69$Belob_sum (fx in row 1 and 2).
But sometimes it's random which rows to add together to find the right match (well in reality it's not random, but it sure feels like it!). Like the last rows with bilagstoptekst == C.
What I'm asking is, if there's a way to add up different combinations and use those for merging.
Expected output:
> FPT8
Bilagstoptekst AKT IA Belob RegKonto Psteksnr
A 80 HUVE 100 4 1
A 80 HUVE 100 4 1
A 80 HUBO 50 5 6
A 25 BILÅ 75 7 8
A 25 BILÅ 40 1 2
B 25 BILÅ 60 6 5
B 25 BILÅ 400 6 5
C 80 HUBO 100 3 7
C 80 HUBO 100 3 7
C 80 HUBO 100 3 7
C 80 HUBO 100 3 7
C 80 HUBO 100 3 7
C 80 HUBO 333 9 9
C 80 HUBO 333 9 9
What I've already tried:
I've spreaded out - for each row of keys - what different values of FPT8$Belob there is.
dt <- as.data.table(FPT8)
dt[, idx := rowid(Bilagstoptekst, AKT, IA)] # creates the timevar
out <- dcast(dt,
Bilagstoptekst + AKT + IA~ paste0("Belob", idx),
value.var = "Belob")
And then I made different combinations of the sums of the FPT8$Belob which I spreaded out:
# Adding together two different FPT8$Belob - all combinations
output <- as.data.frame(combn(ncol(out[,-c(1:3)]), m=2, FUN =function(x) rowSums(out[,-c(1:3)][x])))
names(output) <- paste0("sum_", combn(names(out[,-c(1:3)]), 2, FUN = paste, collapse="_"))
After this I merged forth and back, and I really don't wanna go in to this part, because it was a mess when I had more than 4 different FPT8$Belob per key (the 3 columns). So I definitely need a more smooth way to do this.
Hope someboy can help me.
EDIT: How the rows pair up and a bit more explanation
So my FPT8 data is a bunch of payments (Belob means amount of money). G69 data is the bills. I need to find the right match, but my problem is that sometimes people choose to split their bill up in smaller payments. Therefore the FPT8 data is larger than the G69 data.
Let me explain..
I have 4 key columns to match upon: Bilagstoptekst, AKT, IA and Belob. The 3 first should always find an exact match in the FPT8-data. Sometimes Belob do match up with the Belob_sum in G69 (row by row), sometimes we need a sum combination of the FPT8 Belob rows within the same in Bilagstoptekst, AKT and IA to match with the Belob_sum in G69. Let me try to show it with my sample data below.
FPT8:
Based on my 3 key columns **Bilagstoptekst*, AKT and IA, the first two rows are "the same" (i.e. the same bill payed over two times). I've added and ID column as the first column, which I don't have in my real data. This is just for explanation. So these two rows I call ID=1.
Row number 3 (ID = 2) doesn't pair up with other rows in my sample FPT8 data, since there isn't any other with the combination of key columns (i.e. the person payed the whole bill in once - this one will be easy to match with the G69 bill information).
In the bottum all the Bilagstoptekst==C have the same combination of the three key columns (C, 80 and HUBO). That's the same bill. But these are not the same bill. In this case I can find two matches in the G69 data. How do I know which one is the right one? I look at the FPT8$Belob and the G69$Belob_sum columns.
G69:
So if I'd do this manually I'd try to find different combinations of sums in the FPT8$Belob that matches the G69$Belob_sum together with the other 3 key columns. Fx I can see that the two last rows adds up to 666 in Belob which matches the last row in G69. The other Bilagstoptekst==C, AKT=80 and IA=HUBO matches the second last row in G69 since 100*5=500.
Desired output:
I've added some colors, so I hope it's easier to understand now.