This topic has been discussed hear but still unsolved.
I have a text file containing
[(XXX)].XX>[(XXX)].X.XXX
XXX.[(X)].[(XXX)]>>[(XXX)].XX
There are about 10k lines. [(XXX)], XX Theses can be 1 to 10 of them.
Actual dataset First two line
[Na+].[CH3:2][C:3](=[O:5])[O-].[CH3:6][c:7]1[cH:12][cH:11][cH:10][cH:9][cH:8]1>>[c:7]1([CH3:6])[c:12]([C:3]([c:2]2[cH:11][cH:12][cH:7][cH:8][c:9]2[CH3:10])=[O:5])[cH:11][cH:10][cH:9][cH:8]1
[CH:1]1([C:4]([c:6]2[cH:11][cH:10][c:9]([C:12]([CH3:20])(C)[C:13](N(C)OC)=O)[cH:8][cH:7]2)=[O:5])[CH2:3][CH2:2]1.[BrH:21].[C:22](=[O:25])([O-])[OH:23].[Na+]>O>[Br:21][CH2:3][CH2:2][CH2:1][C:4]([c:6]1[cH:11][cH:10][c:9]([C:12]([CH3:20])([CH3:13])[C:22]([OH:23])=[O:25])[cH:8][cH:7]1)=[O:5]
I want 2 data frame/CSV containing
Data frame 1
1 2 3
1 [(XXX)] XX
2 XXX [(X)] [(XXX)]
Data frame 2
1 2 3
1 [(XXX)] X XXX
2 [(XXX)] XX
I am trying like this but it failed I am getting too many values to unpack (expected 2)
import re
from io import StringIO
with open('Test.txt') as f:
p = f.read()
print(p)
df12, df22 = [], []
for l in p.splitlines():
x, y = re.split(r">+", l)
df12.append(x.split("."))
df22.append(y.split("."))
print(pd.DataFrame(df12))
print(pd.DataFrame(df22))
Appreciate any suggestion.