2

I have a plain text file with the following data:

id=1
name=Scott
occupation=Truck driver
age=23

id=2
name=Dave
occupation=Waiter
age=16

id=3
name=Susan
occupation=Computer programmer
age=29

I'm trying to work out the best way to get to any point in the file given an id string, then grab the rows underneath to extract the data for use in my program. I can do something like:

def get_person_by_id(id):
    file = open('rooms', 'r')
    for line in file:
        if ("id=" + id) in line:
            print(id + " found")

But I'm not sure how I can now go through the next bunch of lines and do line.split("=") or similar to extract the info (put into a list or dict or whatever) that I can use my program. Any pointers?

Artsiom Rudzenka
  • 27,895
  • 4
  • 34
  • 52
njp
  • 695
  • 2
  • 8
  • 21
  • Is all data available for each ID, or may some records have less information than others? – Thijs van Dien Nov 11 '12 at 15:10
  • A lot depends on what you know about the format. Is it always 4 lines per entry? Can there be any other keys? Basically, you can call `file.readline()` several times. – Lev Levitsky Nov 11 '12 at 15:11
  • 1
    Are you able/open to changing the file format a bit? You could use the csv module if you were able to. See here: http://docs.python.org/2/library/csv.html. Perhaps you could make the csv module work for this situation too. – pseudoramble Nov 11 '12 at 15:16
  • Possible duplicate of http://stackoverflow.com/questions/3914454/python-how-to-loop-through-blocks-of-lines – Thijs van Dien Nov 11 '12 at 15:47

7 Answers7

2

One option would be to load the entire thing into memory, which would save you from reading the file every time:

with open('rooms') as f:
    chunks = f.read().split('\n\n')

people_by_id = {}

for chunk in chunks:
    data = dict(row.split('=', 1) for row in chunk.split('\n'))
    people_by_id[data['id']] = data
    del data['id']

def get_person_by_id(id):
    return people_by_id.get(id)
Eric
  • 95,302
  • 53
  • 242
  • 374
  • If the file is very large it may be better not to read the entire file into memory, but rather stop file processing at a specific line. Some of the other answers offer such a solution. – btel Nov 12 '12 at 12:40
1

How about exiting from a for loop after finding the correct line:

def get_person_by_id(id):
    file = open('rooms', 'r')
    for line in file:
        if ("id=" + id) in line:
            print(id + " found")
            break
    #now you can continue processing your file:
    next_line = file.readline()
btel
  • 5,563
  • 6
  • 37
  • 47
0

Maybe:

d = dict()

with open(filename) as f:
    for line in f:
        k,v = line.split('=')
        if 'id=' in line:
            d[v] = {}
        d[d.keys()[-1]][k] = v
Artsiom Rudzenka
  • 27,895
  • 4
  • 34
  • 52
0

Get all the person's attributes and values (i.e. id, name, occupation, age, etc..), till you find an empy line.

def get_person_by_id(id):
    person = {}
    file = open('rooms', 'r')
    for line in file:
        if found == True:
            if line.strip():
                attr, value = line.split("="):
            else:
                return person              
        elif ("id=" + id) in line:
            print(id + " found")
            found = True
            attr, value = line.split("=")
            person[attr] = value
    return person
user278064
  • 9,982
  • 1
  • 33
  • 46
0

And here is an iterative solution.

objects = []
current_object = None
with open("info.txt", "rb") as f:
    for line in f:
        line = line.strip("\r\n")
        if not line:
            current_object = None
            continue
        if current_object is None:
            current_object = {}
            objects.append(current_object)
        key,_,value = line.partition('=')
        current_object[key] = value

print objects
alex_jordan
  • 877
  • 1
  • 6
  • 11
0

Another example of an iterative parser:

from itertools import takewhile
def entries(f):
    e = {}
    def read_one():
        one = {}
        for line in takewhile(lambda x: '=' in x, f):
            key, val = line.strip().split('=')
            one[key] = val
        return one
    while True:
        one = read_one() 
        if not one:
            break
        else:
            e[one.pop('id')] = one
    return e

Example:

>>> with open('data.txt') as f:
..:    print entries(f)['2']
{'age': '16', 'occupation': 'Waiter', 'name': 'Dave'}
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175
0

This solution is a bit more forgiving of empty lines within records.

def read_persons(it):
    person = dict()
    for l in it:
        try:
            k, v = l.strip('\n').split('=', 1)
        except ValueError:
            pass
        else:
            if k == 'id': # New record
                if person:
                    yield person
                    person = dict()
            person[k] = v
    if person:
        yield person
Thijs van Dien
  • 6,516
  • 1
  • 29
  • 48