pandas - str (row) slicing based on int in another column

Question

I have a df:

   colA    colB
0 'abcde'    4
1 'abcde'    2
2 'abcde'    1
3  np.nan   np.nan
4 'wxyz'     3
5 'wxyz'     2

What I would like is to be able to remove the first X characters from colA based on the value in colB and return the value to a new column C like below.

   colA    colB     colC
0 'abcde'    4      'e'
1 'abcde'    2      'cde'
2 'abcde'    1      'bcde'
3  np.nan   np.nan  np.nan
4 'wxyz'     3      'z'
5 'wxyz'     2      'yz'

I've tried some .apply lambda's here 1 with .str[x:] but running into trouble saving it back due to null values in other rows.

Any help much appreciated!

Those quotes in the data are there in actual data or you have included them? @swifty — Vishnudev Krishnadas, Apr 17 '20 at 07:21

jezrael · Accepted Answer · 2020-04-17T07:13:57.673

You can create custom function for return missing values if indexing failed:

def f(a, b):
    try:
        return a[int(b):]
    except:
        return np.nan

df['colC'] = [f(a,b) for a, b in zip(df['colA'], df['colB'])]

Or:

df['colC'] = df.apply(lambda x: f(x['colA'], x['colB']), axis=1)

print (df)
    colA  colB  colC
0  abcde   4.0     e
1  abcde   2.0   cde
2  abcde   1.0  bcde
3    NaN   NaN   NaN
4   wxyz   3.0     z
5   wxyz   2.0    yz

Another idea with test non missing values:

df['colC'] = [a[int(b):] if pd.notna(a) and pd.notna(b) 
                         else np.nan 
                         for a, b in zip(df['colA'], df['colB'])]
print (df)
    colA  colB  colC
0  abcde   4.0     e
1  abcde   2.0   cde
2  abcde   1.0  bcde
3    NaN   NaN   NaN
4   wxyz   3.0     z
5   wxyz   2.0    yz

score 0 · Answer 2 · answered Apr 17 '20 at 07:12

jezrael's answer is probably best and more readable, but if you want to do it in a oneliner you can use df.dropna(). Despite the scary name, it won't alter your original DataFrame unless you call it with the parameter inplace=True.

>>> df['colC'] = df.dropna().apply(lambda x: x[0][int(x[1]):], axis=1)

>>> print(df)
    colA    colB    colC
0   abcde   4.0     e
1   abcde   2.0     cde
2   abcde   1.0     bcde
3   NaN     NaN     NaN
4   wxyz    3.0     z
5   wxyz    2.0     yz

pandas - str (row) slicing based on int in another column

2 Answers2