0

I have a file with data like this. The '>' serves as identifier.

>test1
this is line 1
hi there
>test2
this is line 3
how are you
>test3
this is line 5 and
who are you

I'm trying to create a dictionary

{'>test1':'this is line 1hi there','>test2':'this is line 3how are you','>test3':'this is line 5who are you'}

I've imported the file but I'm unable to do it in this fashion. I want to delete the newline character at the end of each line so as to get one line. Spaces not required as seen. Any help would be appreciated

This is what I've tried so far

new_dict = {}
>>> db = open("/home/ak/Desktop/python_files/smalltext.txt")

for line in db:
    if '>' in line:
        new_dict[line]=''
    else:
        new_dict[line]=new_dict[line].append(line)
  • 1
    Can you show your code? – chrisaycock Jul 08 '14 at 16:26
  • 1
    This question appears to be off-topic because it is about getting us to show you teh codez – Marcin Jul 08 '14 at 16:44
  • Your code indicates that you want a list as the value for each key but you example shows a string. Which is it? – dawg Jul 08 '14 at 17:10
  • @dawg I think he is just assuming that strings also have an `append` method that is equivalent to `+=`. – chepner Jul 08 '14 at 17:20
  • it's too bad there isn't a generalized version of the `locals` built-in function -- i.e., a function you can call on any namespace (e.g., a module) that returns all of the variables from that namespace in a dict. – abcd Mar 30 '15 at 21:08

3 Answers3

3

Using your approach it would be:

new_dict = {}
>>> db = open("/home/ak/Desktop/python_files/smalltext.txt", 'r')

for line in db:
    if '>' in line:
        key = line.strip()    #Strips the newline characters
        new_dict[key]=''
    else:
        new_dict[key] += line.strip()
user2963623
  • 2,267
  • 1
  • 14
  • 25
1

Here is a solution using groupby:

from itertools import groupby

kvs=[]
with open(f_name) as f:
    for k, v in groupby((e.rstrip() for e in f), lambda s: s.startswith('>')):
        kvs.append(''.join(v) if k else '\n'.join(v))    

print {k:v for k,v in zip(kvs[0::2], kvs[1::2])}

The dict:

{'>test1': 'this is line 1\n\nhi there', 
 '>test2': 'this is line 3\n\nhow are you', 
 '>test3': 'this is line 5 and\n\nwho are you'}
the wolf
  • 34,510
  • 13
  • 53
  • 71
0

You can use a regex:

import re

di={}
pat=re.compile(r'^(>.*?)$(.*?)(?=^>|\Z)', re.S | re.M)
with open(fn) as f:
    txt=f.read()
    for k, v in ((m.group(1), m.group(2)) for m in pat.finditer(txt)):
        di[k]=v.strip()

print di       


# {'>test1': 'this is line 1\nhi there', '>test2': 'this is line 3\nhow are you', '>test3': 'this is line 5 and\nwho are you'}
dawg
  • 98,345
  • 23
  • 131
  • 206