I have a list string, and I want extract three patterns and form a dataframe. Each string has three part, first part, s_
or t_q_
or NaN; the middle pattern could be any letter, number and _, but cannot end with abc
; last part, should be abc
or NaN
import pandas as pd
import re
str_list = ['s_c45abc','s_ab00_a','t_q_de45abc','t_q_123','t_q_c34b7_da','456a','456abc','456b']
pd.Series(str_list).str.extract(r"(s_|t_q_)?(\w+[^(abc)])(abc)?")
0 1 2
0 s_ c45 abc
1 s_ ab00_ NaN
2 t_q_ de45 abc
3 t_q_ 123 NaN
4 t_q_ c34b7_d NaN
5 NaN 456 NaN
6 NaN 456 abc
7 NaN 456 NaN
However, the second/fifth/sixth/last rows are incorrect. The true result should be
0 1 2
0 s_ c45 abc
1 s_ ab00_a NaN
2 t_q_ de45 abc
3 t_q_ 123 NaN
4 t_q_ c34b7_da NaN
5 NaN 456a NaN
6 NaN 456 abc
7 NaN 456b NaN