0

I would like to convert the data into a dictionary to work with. The data looks like keys and values in a dictionary, but they are combined into a single element.

here's a sample of the data

['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3+]",\n',
 '"amino acid": "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2+]#[NX1]),$(-[NX2]=[NX2+]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]",\n',
 '"bromine": "[Br]",\n']

I have tried removing the : in the data using the replace command, but it didn't work.

i=0
for line in lines:
    a = lines[i]
    a.replace(":", "")
    lines[i] = a
    i+=1

4 Answers4

0
d = {}
for line in lines:
    s = line.split(":")
    d[s[0].strip(' "')] = s[1].strip(' ",\n')
Rani Sharim
  • 610
  • 3
  • 7
0

You can use eval:

ll = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3+]",\n',
 '"amino acid": "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2+]#[NX1]),$(-[NX2]=[NX2+]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]",\n',
 '"bromine": "[Br]",\n']

dd = eval('{' + ' '.join(ll).replace('\n', '') + '}')

This converts your list to a single string, removes the \n and adds the curly braces, you then have a str that can be evaluated as it's valid python code to form a dictionary.

dzang
  • 2,160
  • 2
  • 12
  • 21
  • 2
    Note, `eval` is considered categorically unsafe as the statement being evaluated does not undergo any ‘security’ checks. Recommend using `ast.literal_eval` instead. Even more robust in this case will be performing the conversion using the `json` library. – S3DEV Oct 09 '21 at 07:59
  • Adding ref to above comment, https://stackoverflow.com/a/1832957/4985099 – sushanth Oct 09 '21 at 08:01
  • BTW, you don't need to remove the `\n` – Jiří Baum Oct 09 '21 at 08:01
0

This is just a problem of formatting or more precisely data cleaning. I am not sure why you are using an increment variable. The foremost thing I will like to handle is the newline character at the end of each element, then split it based on ': ' and create a dictionary using the values. You can try the code below.

d = {}
for element in lines:
    element = element.rstrip(",\n")
    key, value = element.split(": ")
    d[key.strip('"')] = value.strip('"')
d   

I have used to strip('"') to remove multiple quotation marks.

Dharman
  • 30,962
  • 25
  • 85
  • 135
0

Each element in the list is a string ending in ',\n'. These should be removed. The keys and values have unnecessary double-quotes. These should also be removed. I think this should give you what you need:

mylist = ['"acetic anydride": "[CX3](=[OX1])[OX2][CX3](=[OX1])",\n',
 '"acetylenic carbon": "[$([CX2]#C)]",\n',
 '"acyl bromide": "[CX3](=[OX1])[Br]",\n',
 '"acyl chloride": "[CX3](=[OX1])[Cl]",\n',
 '"acyl fluoride": "[CX3](=[OX1])[F]",\n',
 '"acyl iodide": "[CX3](=[OX1])[I]",\n',
 '"aldehyde": "[CX3H1](=O)[#6]",\n',
 '"alkane": "[CX4]",\n',
 '"allenic carbon": "[$([CX2](=C)=C)]",\n',
 '"amide": "[NX3][CX3](=[OX1])[#6]",\n',
 '"amidium": "[NX3][CX3]=[NX3+]",\n',
 '"amino acid": "[$([NX3H2,NX4H3+]),$([NX3H](C)(C))][CX4H]([*])[CX3](=[OX1])[OX2H,OX1-,N]",\n',
 '"azide": "[$(-[NX2-]-[NX2+]#[NX1]),$(-[NX2]=[NX2+]=[NX1-])]",\n',
 '"azo nitrogen": "[NX2]=N",\n',
 '"azole": "[$([nr5]:[nr5,or5,sr5]),$([nr5]:[cr5]:[nr5,or5,sr5])]",\n',
 '"azoxy nitrogen": "[$([NX2]=[NX3+]([O-])[#6]),$([NX2]=[NX3+0](=[O])[#6])]",\n',
 '"diazene": "[NX2]=[NX2]",\n',
 '"diazo nitrogen": "[$([#6]=[N+]=[N-]),$([#6-]-[N+]#[N])]",\n',
 '"bromine": "[Br]",\n']

mydict = dict()
for e in mylist:
    t = e.replace('"', '').split(':')
    mydict[t[0]] = t[1][:-2].strip()

print(mydict)