0

This topic has been discussed hear but still unsolved.

I have a text file containing

[(XXX)].XX>[(XXX)].X.XXX
XXX.[(X)].[(XXX)]>>[(XXX)].XX

There are about 10k lines. [(XXX)], XX Theses can be 1 to 10 of them.

Actual dataset First two line

[Na+].[CH3:2][C:3](=[O:5])[O-].[CH3:6][c:7]1[cH:12][cH:11][cH:10][cH:9][cH:8]1>>[c:7]1([CH3:6])[c:12]([C:3]([c:2]2[cH:11][cH:12][cH:7][cH:8][c:9]2[CH3:10])=[O:5])[cH:11][cH:10][cH:9][cH:8]1
[CH:1]1([C:4]([c:6]2[cH:11][cH:10][c:9]([C:12]([CH3:20])(C)[C:13](N(C)OC)=O)[cH:8][cH:7]2)=[O:5])[CH2:3][CH2:2]1.[BrH:21].[C:22](=[O:25])([O-])[OH:23].[Na+]>O>[Br:21][CH2:3][CH2:2][CH2:1][C:4]([c:6]1[cH:11][cH:10][c:9]([C:12]([CH3:20])([CH3:13])[C:22]([OH:23])=[O:25])[cH:8][cH:7]1)=[O:5]

I want 2 data frame/CSV containing

Data frame 1

     1       2       3         
1 [(XXX)]   XX 
2 XXX      [(X)]  [(XXX)]

Data frame 2

     1        2    3   
1  [(XXX)]    X   XXX
2  [(XXX)]   XX

I am trying like this but it failed I am getting too many values to unpack (expected 2)

import re
from io import StringIO
with open('Test.txt') as f:
    p = f.read()
print(p)
df12, df22 = [], []
 
for l in p.splitlines():
    x, y = re.split(r">+", l)
    df12.append(x.split("."))
    df22.append(y.split("."))
 
print(pd.DataFrame(df12))
print(pd.DataFrame(df22))

Appreciate any suggestion.

  • `splitlines` returns a list. The list can be assigned to two variables only if it is exactly two elements long. You have lines which contain more than one delimiter (or you could get this symptom if a line contained none at all). This is a common FAQ; please search before asking. – tripleee Jul 18 '20 at 08:17
  • Sorry for that I will be cautious next time. – Protima Rani Paul Jul 18 '20 at 17:17

0 Answers0