0

I have a large dataframe consisting originally of forecasted data of overall application and then have added columns of forecasted data based on how much percentage traffic each service gets from the overall traffic.

       PeakHourForecast  address_Forecast  arc_Forecast  auth_Forecast  \
0            747.787093        186.946773    186.946773     411.282901   
1            691.159730        172.789933    172.789933     380.137852   
2            655.040498        163.760124    163.760124     360.272274   
3            630.850889        157.712722    157.712722     346.967989   
4            619.764089        154.941022    154.941022     340.870249   
...                 ...               ...           ...            ...   
42403       1097.177031        274.294258    274.294258     603.447367   
42404       1060.533763        265.133441    265.133441     583.293570   
42405       1024.620098        256.155024    256.155024     563.541054   
42406        961.448085        240.362021    240.362021     528.796447   
42407        875.026753        218.756688    218.756688     481.264714   

       authreversal_Forecast  bill_Forecast  credit_Forecast  \
0                  74.778709     269.203353        74.778709   
1                  69.115973     248.817503        69.115973   
2                  65.504050     235.814579        65.504050   
3                  63.085089     227.106320        63.085089   
4                  61.976409     223.115072        61.976409   
...                      ...            ...              ...   
42403             109.717703     394.983731       109.717703   
42404             106.053376     381.792155       106.053376   
42405             102.462010     368.863235       102.462010   
42406              96.144809     346.121311        96.144809   
42407              87.502675     315.009631        87.502675   

Based on this I have secondary columns for each service which are True or False if the forecast for that service is above it's current capacity. However due to the size of the volume printing out the dataframe only shows a small amount of the rows and columns. Some components may have risk as False for most rows but will have spots where they are true and i am not seeing those in the print.

I had been trying to see risk level of each service by simply filtering like data2.filter(like='Risk') which gives below

       RiskPresent  address_RiskPresent  arc_RiskPresent  auth_RiskPresent  \
0            False                False             True             False   
1            False                False             True             False   
2            False                False             True             False   
3            False                False             True             False   
4            False                False             True             False   
...            ...                  ...              ...               ...   
42403        False                False             True             False   
42404        False                False             True             False   
42405        False                False             True             False   
42406        False                False             True             False   
42407        False                False             True             False

       authreversal_RiskPresent  bill_RiskPresent  credit_RiskPresent  \
0                         False             False               False   
1                         False             False               False   
2                         False             False               False   
3                         False             False               False   
4                         False             False               False   
...                         ...               ...                 ...   
42403                     False             False               False   
42404                     False             False               False   
42405                     False             False               False   
42406                     False             False               False   
42407                     False             False               False   

As we can see there is arc_RiskPresent where the values are basically all True. However in looking through an outputted excel file I can see there are risk = True values in other columns here and there. So how can i find all rows which have True in them for every _RiskPresent column? Ideally i would like to then be able to tie each _RiskPresent = True row to the _Forecast row for that component as well.

I have been searching for this but all the results are very basic and havent been very helpful. The closest help i have seen is to do something like below but that isnt getting me very far and has these odd NaN rows which i dont see in excel output file.

a = data2.filter(like='_RiskPresent').apply(lambda row: row[row==True], axis=1)

print(a)
      arc_RiskPresent fingerprint_RiskPresent giftcardservice_RiskPresent  \
0                True                     NaN                         NaN   
1                True                     NaN                         NaN   
2                True                     NaN                         NaN   
3                True                     NaN                         NaN   
4                True                     NaN                         NaN   
...               ...                     ...                         ...   
42403            True                     NaN                        True   
42404            True                     NaN                        True   
42405            True                     NaN                        True   
42406            True                     NaN                        True   
42407            True                     NaN                        True   

      paypalservice_RiskPresent  
0                           NaN  
1                           NaN  
2                           NaN  
3                           NaN  
4                           NaN  
...                         ...  
42403                       NaN  
42404                       NaN  
42405                       NaN  
42406                       NaN  
42407                       NaN  

However doing print(a.all()) seems to at least give me each column name which has True in it somewhere, but i'm not sure if this is actually 100% of them nor does it help me identify where in the forecasted data we going over capacity so I cannot identify how much over it is.

arc_RiskPresent                True
fingerprint_RiskPresent        True
giftcardservice_RiskPresent    True
paypalservice_RiskPresent      True
dtype: bool
lwoodruf
  • 1
  • 1
  • Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Oct 03 '22 at 23:22

0 Answers0