Map to a Trans capacity request from transit(Nodal) points using pandas dataframes

Question

Input Dataframe With TerminalID,TName,XY cordinate, PeopleID

import pandas as pd

data = {
        'TerminalID': ['5','21','21','2','21','2','5','22','22','22','2','32','41','41','42','50','50'],
        'TName': ['AD','AMBO','AMBO','PS','AMBO','PS','AD','AM','AM','AM','PS','BO','BA','BA','BB','AZ','AZ'],
        'xy': ['1.12731,1.153756','0.12731,0.153757','0.12731,0.153757','1.989385,1.201941','0.12731,0.153757','1.989385,1.201941','1.12731,1.153756','2.12731,1.153756','2.12731,1.153756','2.12731,1.153756','1.989385,1.201941','1.989385,1.201941','2.989385,1.201941','2.989385,1.201941','2.989385,3.201941','3.989385,3.201941','3.989385,3.201941'],
        'Pcode': [ 'None','Z014','Z015','Z016','Z017','Z018','None','Z020','Z021','Z022','Z023','Z024','Z025','Z026','Z027','Z028','Z029']
    }

df = pd.DataFrame.from_dict(data)

Out[55]:

output of DF1

   TerminalID TName                 xy Pcode
0           5    AD   1.12731,1.153756  None
1          21  AMBO   0.12731,0.153757  Z014
2          21  AMBO   0.12731,0.153757  Z015
3           2    PS  1.989385,1.201941  Z016
4          21  AMBO   0.12731,0.153757  Z017
5           2    PS  1.989385,1.201941  Z018
6           5    AD   1.12731,1.153756  None
7          22    AM   2.12731,1.153756  Z020
8          22    AM   2.12731,1.153756  Z021
9          22    AM   2.12731,1.153756  Z022
10          2    PS  1.989385,1.201941  Z023
11         32    BO  1.989385,1.201941  Z024
12         41    BA  2.989385,1.201941  Z025
13         41    BA  2.989385,1.201941  Z026
14         42    BB  2.989385,3.201941  Z027
15         50    AZ  3.989385,3.201941  Z028
16         50    AZ  3.989385,3.201941  Z029

DF2,

T_cap is the capacity requirement at the Terminal ID and the T_load is the Load details, Tcap is the running count increment and T_load is the actual request at the Termainal, The 0 at the start and the end are padding for the solution

data2= {
        'BusID': ['18','18','18','18','18','18','18','18','18'],
        'Tcap': ['0','2','3','6','7','8','10','12','12'],
        'T_Load': ['0','2','1','2','2','1','2','2','0'],
        'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
        
        'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD']
    }

df2 = pd.DataFrame.from_dict(data2)

Out[59]:

  BusID Tcap T_Load TerminalID TName
0    18    0      0          5    AD
1    18    2      2         21  AMBO
2    18    3      1         33    AM
3    18    6      2          2    PS
4    18    7      2         32    BO
5    18    8      1         42    BB
6    18   10      2         41    BA
7    18   12      2         50    AZ
8    18   12      0          5    AD

Data Frame # the Final output requested

The output is based on the T_Load contstrains.

data3 = {
        'BusID': ['18','18','18','18','18','18','18','18','18'],
        'Tcap': ['0','2','3','6','7','8','10','12','12'],
        'T_Load': ['0','2','1','3','1','1','2','2','0'],
        'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
        
        'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD'],
        'Pcode':['None','Z013,Z019','Z020','Z016,Z018,Z023','Z024','Z027','Z025,Z026','Z028,Z029','None']
    }
    
    df3 = pd.DataFrame.from_dict(data3)

Out[61]:

  BusID Tcap T_Load TerminalID TName           Pcode
0    18    0      0          5    AD            None
1    18    2      2         21  AMBO       Z013,Z019
2    18    3      1         33    AM            Z020
3    18    6      3          2    PS  Z016,Z018,Z023
4    18    7      1         32    BO            Z024
5    18    8      1         42    BB            Z027
6    18   10      2         41    BA       Z025,Z026
7    18   12      2         50    AZ       Z028,Z029
8    18   12      0          5    AD            None

Thanking you

jezrael · Accepted Answer · 2022-05-17T09:45:14.613

2

My solution aggreagte join per TerminalID and TName and assign to another DataFrame by aggreagte list, last filter values by positions in list comprehension with join:

s = df.groupby(['TerminalID','TName'])['Pcode'].agg(list).rename('P_list')
df = df2.join(s, on=['TerminalID','TName'])

df['P_list'] = [','.join(x[:int(y)]) if int(y) != 0 else None 
                for x, y in zip(df['P_list'], df['T_Load'])]
print (df)
  BusID Tcap T_Load TerminalID TName          P_list
0    18    0      0          5    AD            None
1    18    2      2         21  AMBO       Z014,Z015
2    18    3      1         22    AM            Z020
3    18    6      3          2    PS  Z016,Z018,Z023
4    18    7      1         32    BO            Z024
5    18    8      1         42    BB            Z027
6    18   10      2         41    BA       Z025,Z026
7    18   12      2         50    AZ       Z028,Z029
8    18   12      0          5    AD            None

edited May 17 '22 at 09:45

answered May 17 '22 at 09:15

jezrael

822,522
95
1,334
1,252

1

Sorry Jezrael, I need only two P_list in the second row since the T_Load is only 2.. and in the 3rd row I need only 3 three, also in 2nd row I need only 2 as T_Load is 2 – vinsent paramanantham May 17 '22 at 09:37
@jesrael, why I am getting float object not sub-scriptable. TypeError: 'float' object is not subscriptable I see the outputs 3 H11153,Z0001,L0001 H,1,1 the first string is getting splitted.. should I change my inputs to a single string.. example Z014, Z,0,1, the split is happening character level, – vinsent paramanantham Jun 02 '22 at 14:46
why I am getting the P_list allocated again..! – vinsent paramanantham Jun 02 '22 at 17:57
I have added one more question, I this approach I getting pcode repeated for some reason in the assignment, in this question I have given variation in the Vehicle or bus ID, please answer my question https://stackoverflow.com/questions/72481631/nodal-label-assignment-according-to-the-load-at-a-terminal-in-pandas-data-frame – vinsent paramanantham Jun 03 '22 at 07:52
1

I have added few comments above – vinsent paramanantham Jun 03 '22 at 07:55

mozway · Answer 2 · 2022-05-17T10:08:05.047

2

You can map the aggregated strings per TName:

df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode'].agg(','.join))

or, if you want to replace multiple string None as single one:

df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode']
                                  .agg(lambda x: ','.join(e for e in x if e != 'None'))
                                  .replace('', 'None')
                                )

output:

  BusID Tcap T_Load TerminalID TName           Plist
0    18    0      0          5    AD            None
1    18    2      2         21  AMBO  Z014,Z015,Z017
2    18    3      1         22    AM  Z020,Z021,Z022
3    18    6      3          2    PS  Z016,Z018,Z023
4    18    7      1         32    BO            Z024
5    18    8      1         42    BB            Z027
6    18   10      2         41    BA       Z025,Z026
7    18   12      2         50    AZ       Z028,Z029
8    18   12      0          5    AD            None

update: limiting the output:

you can then trim the column with a regex, we can use a groupby to benefit from vectorized string operations within each group (this is mostly interesting if there are few groups and many rows):

df2['P_list'] = (df2.groupby('T_Load')['P_list']
                    .apply(lambda c: c.str.extract(rf'((?:[^,]+,?){{,{str(c.name)}}})',
                                                   expand=False)
                                    .str.strip(',')
                          )
                    .replace('', 'None')
                )