0

I have the following datafame, df1:

       ID     Min_Value      Max_Value     
---------------------------------------
0    ID_1           100            150                
1    ID_1           150            170                
2    ID_2            80            105              
3    ID_2           105            120              

I then have another dataframe, df2, with data that looks like:

         ID     Value  
----------------------
  0    ID_1       102 
  1    ID_1       101
  2    ID_1       155   
  3    ID_1       165
  4    ID_1       162  
  5    ID_1       159
...       
 55    ID_1       105            
 56    ID_1       121
 57    ID_1       143  
 58    ID_1       137
 59    ID_1       155   
 60    ID_1       165
...
100    ID_2        95           
101    ID_2        81
102    ID_2        91  
103    ID_2       101
104    ID_2       115
105    ID_2       117
...
165    ID_2        91
166    ID_2        90
167    ID_2       105
168    ID_2       119
169    ID_2        84
170    ID_2        86
...

And so df1 shows for each unique "ID" there are two ranges, or bins. For ID_1, we have a lower bin: 100-150, and an upper bin: 150-170. And then for ID_2, we have a lower bin: 80-105, and an upper bin: 105-120. And then I have df2, which contains hundreds of rows, showing a value for each ID, for where in this case there are only 2 IDs, ID_1 and ID_2. What I want to do is bin the values of df2 to find out how many of its values fall within each of the bins for each ID in df1.

And so I want to create the following df3:

       ID     Bin_1      Bin_2     Proportion_Pop
-------------------------------------------------
0    ID_1        XX        XX               0.XX                
1    ID_1        XX        XX               0.XX    
2    ID_2        XX        XX               0.XX   
3    ID_2        XX        XX               0.XX 

Where in this df3, I am finding out, for each unique ID, here ID_1 and ID_2, how many of the corresponding values fall within the lower bin - Bin_1, and then how many of the corresponding values fall within the upper bin-Bin_2? And then, what proportion of the total population of each ID_2 fall within each corresponding bin? These "Proportion_Pop" values for each ID should sum to 1.0.

I am having trouble figuring out how to approach this in a way that is dynamic and can accommodate if there perhaps happen to be more IDs, e.g. ID_3, ID_4, ID_5, etc., and as well more than 2 bins, e.g. Bin_3, Bin_4, Bin_5.

What I am thinking to do is capture the value ranges for each Bin for each ID, and then place them in a dictionary, and then after, loop through that dictionary for each ID, and then count the values in each bin via value_counts.() to derive the proportion of the total population, but this seems to be getting messy. Is there a straightforward way to accomplish this using '.value_counts()'?

  • Are there only two bins for each ID? – BrokenBenchmark Jan 09 '23 at 23:47
  • Sorry, I should have explain better. This is actually the challenge, the part where I need to make this dynamic. For my example, I just showed two bins for each ID. But, I could end up needing to process a dataset with more then two bins for each ID. And so there would be a lower bin, middle bin, and upper bin. Since I could potentially have 4, 5, 6, and so on bins, that is why I am deciding to label the bins just with numbers, with increasing numbers meaning higher ranges. – LostinSpatialAnalysis Jan 09 '23 at 23:51
  • Seems like you are trying to create a histogram or a dataframe of histograms. Try looking at the pandas.DataFrame.hist method – Galo do Leste Jan 10 '23 at 00:13
  • Can you add some sample data (ideally with a `df.DataFrame()` call) and your expected output (cf. [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples))? – fsimonjetz Jan 10 '23 at 09:10

0 Answers0