0

How can I left join between two df by multiple conditions and dynamic probabilities?

DF A: enter image description here

DF B: enter image description here

I would like to left join 3 times between tables A and B and the conditions should be dynamic. It's a bit complex so I'll try to explain with an example:

In table A - Column A should contain one number from table B by this logic- All the numbers in table B that are equal/less than the amount we have in the amount column in table A. So for instance: 120 (table A) should get one of the following in table B - 120,110,100,90,80. The number should be selected by probabilities - I want to be able to define probabilities for those numbers (for example, 120 - 20%, 110 - 5%, 100 - 50%, 90 - 10%, 80 - 15%).

Column B should contain one number from table B by this logic- All the numbers in table B that are greater and not equal to 999 to the amount we have in the amount column in table A. So for instance: 120 (table A) should get one of the following in table B - 130,140,150,160,170. The number should be selected by probabilities - I want to be able to define probabilities for those numbers (for example, 130 - 20%, 140 - 5%, 150 - 50%, 160 - 10%, 170 - 15%).

Column C in table A should always get 999.

Hopefully I managed to explain myself. Thanks in advance.

ABC
  • 67
  • 6
  • I can't comprehend your logic and am not sure you are defining a merge/join as in your logic B gets from B sometimes. Possibly you are talking about this... https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.merge_asof.html – Rob Raymond Jun 30 '21 at 09:28
  • So, just to make sure; when you chose the number in table B, the given probabilities is the probability that you chose a given number, right? What should be in which column (Column A, B?) – CutePoison Jun 30 '21 at 09:30
  • Correct @CutePoison – ABC Jun 30 '21 at 09:34
  • Should that number be in the entire A or B column or just one of them? I.e do you pick an element, based on the probability, place it in Row 1 (col A) and then pick another one (based on the other logic) and place that in Row 1( col B), then you move on to Row 2 and do the same? – CutePoison Jun 30 '21 at 09:41
  • The number in column A should be different from the number in column B - based on the logic I've described. – ABC Jun 30 '21 at 09:50
  • Yes, but is it run one time? Do you pick a number, fill up *entire* column A (all rows with that one number, e.g 120) and then *entire* column B (based on the logic), or do you pick a number for each row? – CutePoison Jun 30 '21 at 10:02
  • Pick a number for each row – ABC Jun 30 '21 at 10:04
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/234362/discussion-between-abc-and-cutepoison). – ABC Jun 30 '21 at 10:05

1 Answers1

0

It's a bit difficult to understand exactly what you wan't, but I'll provide some tips:

Way to select based on the distribution (probability):

  1. Create an array with the given cumulated-probability intervals e.g prob_in = [0,0.2,0.25,0.75,0.85,1] #20%+5%+50%+10%+15% and an array with your numbers corresponding to the probabilities
    num_pick = [120,110, 100, 90, 80]
  2. Draw a number from the uniform-distribution [0,1], called u (say you get u=0.35).
  3. Check in which interval u is, i.e in the third interval [0.25,0.75] thus you pick the third element in num_pick (which is 100)
  4. Do what you want with that number This might be much to do, luckily you can use the answer from this SO question
from numpy.random import choice
draw = choice(list_of_candidates, number_of_items_to_pick,
              p=probability_distribution)

remember to decide if replace=False/True should be used i.e can the same item be chosen multiple times?

CutePoison
  • 4,679
  • 5
  • 28
  • 63
  • Thanks for your answer but the issue is that the 2nd array you mentioned (num_pick) should be dynamic. For 120 the array should be as you said num_pick = [120,110, 100, 90, 80]. But for 140 for example the array should be [140,130,120,110,100] – ABC Jun 30 '21 at 09:52
  • This was an answer for Logic A (how to pick a number) - does the probabilities for picking the number (logic A) change aswell? – CutePoison Jun 30 '21 at 10:05