I have the following datafame, df1:
ID Min_Value Max_Value
---------------------------------------
0 ID_1 100 150
1 ID_1 150 170
2 ID_2 80 105
3 ID_2 105 120
I then have another dataframe, df2, with data that looks like:
ID Value
----------------------
0 ID_1 102
1 ID_1 101
2 ID_1 155
3 ID_1 165
4 ID_1 162
5 ID_1 159
...
55 ID_1 105
56 ID_1 121
57 ID_1 143
58 ID_1 137
59 ID_1 155
60 ID_1 165
...
100 ID_2 95
101 ID_2 81
102 ID_2 91
103 ID_2 101
104 ID_2 115
105 ID_2 117
...
165 ID_2 91
166 ID_2 90
167 ID_2 105
168 ID_2 119
169 ID_2 84
170 ID_2 86
...
And so df1 shows for each unique "ID" there are two ranges, or bins. For ID_1, we have a lower bin: 100-150, and an upper bin: 150-170. And then for ID_2, we have a lower bin: 80-105, and an upper bin: 105-120. And then I have df2, which contains hundreds of rows, showing a value for each ID, for where in this case there are only 2 IDs, ID_1 and ID_2. What I want to do is bin the values of df2 to find out how many of its values fall within each of the bins for each ID in df1.
And so I want to create the following df3:
ID Bin_1 Bin_2 Proportion_Pop
-------------------------------------------------
0 ID_1 XX XX 0.XX
1 ID_1 XX XX 0.XX
2 ID_2 XX XX 0.XX
3 ID_2 XX XX 0.XX
Where in this df3, I am finding out, for each unique ID, here ID_1 and ID_2, how many of the corresponding values fall within the lower bin - Bin_1, and then how many of the corresponding values fall within the upper bin-Bin_2? And then, what proportion of the total population of each ID_2 fall within each corresponding bin? These "Proportion_Pop" values for each ID should sum to 1.0.
I am having trouble figuring out how to approach this in a way that is dynamic and can accommodate if there perhaps happen to be more IDs, e.g. ID_3, ID_4, ID_5, etc., and as well more than 2 bins, e.g. Bin_3, Bin_4, Bin_5.
What I am thinking to do is capture the value ranges for each Bin for each ID, and then place them in a dictionary, and then after, loop through that dictionary for each ID, and then count the values in each bin via value_counts.()
to derive the proportion of the total population, but this seems to be getting messy. Is there a straightforward way to accomplish this using '.value_counts()'?