I'd like to break up a string into Pandas DataFrame columns using a regex.
Sample csv data [Updated]:
Data;Code;Temp;....
12 364 OPR 4 67474;;33;...
893 73 GDP hdj 747;;34;...
hr 777 hr9 GDP;;30;...
463 7g 448 OPR;;28;...
Desired situation: [Updated]
Data | Code | Temp | ...
------------------------------------------------
12 364 | OPR 4 67474 | 33 | ...
893 73 | GDP hdj 747 | 34 | ...
hr 777 hr9 GDP | NaN | 30 | ...
463 7g 448 OPR | NaN | 28 | ...
regex:
code = re.compile('\sOPR.?[^$]|\sGDP.?[^$]')
I only need to split if OPR
or GDP
is not at the end of the string.
I was looking for a way to split based on the match position. Something like: match.start()
)
I tried something like: df['data'].str.contains(code, regex=True)
and df['data'] = df['data'].str.extract(code, expand=True)
and str.find
only seems to work with a string and not with re.Pattern
. I don't get it done.
I'm pretty new with Pandas, so please bear with me.