-3

How do I add the column position in my pandas table, given only the first 3 columns

roll Num Address WordInAddr Position
1 Block A Block 1
1 Block A A 2
2 South New Jersey Street Jersey South 1
2 South New Jersey Street Jersey Jersey 3
2 South New Jersey Street Jersey Street 4
2 South New Jersey Street Jersey Jersey 5

1 Answers1

0

You can use numpy broadcasting to compute the dense matrix. Reset the lower triangle of the dense matrix and get the position (index) of the first true value.

def find_pos(df):
    # Extract the unique string from Address
    s = np.array(df['Address'].iloc[0].split())

    # Extract words from WordInAddr
    w = df['WordInAddr'].values[:, None]

    # Create the boolean dense matrix
    m = s == w

    # Reset the lower triangle
    m[np.tril_indices_from(m, k=-1)] = False

    # Return the position
    return pd.Series(np.argmax(m, axis=1) + 1, index=df.index)

df['Position'] = df.groupby('roll Num').apply(find_pos).droplevel(0)

Output:

>>> df
   roll Num                         Address WordInAddr  Position
0         1                         Block A      Block         1
1         1                         Block A          A         2
2         2  South New Jersey Street Jersey      South         1
3         2  South New Jersey Street Jersey     Jersey         3
4         2  South New Jersey Street Jersey     Street         4
5         2  South New Jersey Street Jersey     Jersey         5
Corralien
  • 109,409
  • 8
  • 28
  • 52