Python - search txt.file for ID, then return variable from line below

Question

In Python, I'm attempting (very poorly) to read a .txt file, find the last occurrence of a string referencing a particular customer, and read a few lines below that to gain their current points balance.

A snapshot of the .txt file is:

Customer ID:123
Total sale amount:2345.45

Points from sale:23
Points until next bonus: 77

I can search for (and find) the specific Customer ID, but can't figure out how to search for the last occurrence of this ID only, or how to return the 'Points until next bonus' value... I apologise if this is a basic question, but any help would be greatly appreciated!

My code so far...

def reward_points():

#current points total
rewards = open('sales.txt', 'r')

line = rewards.readlines()
search = (str('Customer ID:') + str(Cust_ID))
print(search) #Customer ID:123

while line != ' ':
    if line.startswith(search):
        find('Points until next bonus:')
        current_point_total = line[50:52]
        cust_record = rewards.readlines()
        print(current_point_total)


rewards.close()

reward_points()

Can you show us some attempt you've made so we can help you better? — James Mills, May 23 '15 at 08:37
Thanks James - added in. As you can see - coding is NOT my forte!! — BMSydney, May 23 '15 at 08:54

cms · Answer 1 · 2015-05-23T16:07:34.463

I think you'd be better off parsing the file into structured data, rather than trying to seek around the file, which isn't in a particularly convenient file format.

Here's a suggested approach

Iterate over the file with readline

Split the line into fields and labels by matching on ':'

Put the fields and labels representing a customer into a dict

Put the dict representing a customer into another dict

You then have an in-memory database , that you can dereference by dict lookups

e.g customers['1234']['Points until next bonus']

Here's a simplified example code of this approach

#!/usr/bin/env python
import re

# dictionary with all the customers in 
customers = dict()

with open("sales.txt") as f:
    #one line at a time
    for line in f:
        #pattern match on 'key : value'
        field_match = re.match('^(.*):(.*)$',line)

        if field_match :
            # store the fields in variables
            (key,value) = field_match.groups()
            # Customer ID means a new record
            if key == "Customer ID" :
                # set a key for the 'customers database'
                current_id = value
                # if we have never seen this id before it's the first, make a record
                if customers.get(current_id) == None :
                    customers[current_id] = []
                # make the record an ordered list of dicts for each block
                customers[current_id].append(dict())
            # not a new record, so store the key and value in the dictionary at the end of the list
            customers[current_id][-1][key] = value

# now customers is a "database" indexed on customer id
#  where the values are a list of dicts of each data block
#
# -1 indexes the last of the list
# so the last customer's record for "123" is 

print customers["123"][-1]["Points until next bonus"]

Updated

I didn't realise you had multiple blocks for customers, and were interested in the ordering, so I reworked the sample code to keep an ordered list of each data block parsed against customer id

It's possible to even improve this answer with [defaultdict](https://docs.python.org/2/library/collections.html#defaultdict-examples). — sobolevn, May 23 '15 at 10:17
@sobolevn I'm trying to keep the code as close to simple pseudocode as I can, because I think the OP isn't very fluent in python. Although now the list indices are a bit obscuring :-/ — cms, May 23 '15 at 10:20

James Mills · Answer 2 · 2015-05-23T10:26:11.043

This is a good use-case for itertools.groupby() and this use-case fits that pattern rather well:

Example:

from itertools import groupby, ifilter, imap


def search(d):
    """Key function used to group our dataset"""

    return d[0] == "Customer ID"


def read_customer_records(filename):
    """Read customer records and return a nicer data structure"""

    data = {}

    with open(filename, "r") as f:
        # clean adn remove blank lines
        lines = ifilter(None, imap(str.strip, f))

        # split each line on the ':' token
        lines = (line.split(":", 1) for line in lines)

        # iterate through each customer and their records
        for newcustomer, records in groupby(lines, search):
            if newcustomer:
                # we've found a new customer
                # create a new dict against their customer id
                customer_id = list(records)[0][1]
                data[customer_id] = {}
            else:
                # we've found customer records
                # add each key/value pair (split from ';')
                # to the customer record from above
                for k, v in records:
                    data[customer_id][k] = v

    return data

Output:

>>> read_customer_records("foo.txt")
{'123': {'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}, '124': {'Total sale amount': '245.45', 'Points until next bonus': ' 79', 'Points from sale': '27'}}

You can then lookup customers directly; for example:

>>> data = read_customer_records("foo.txt")
>>> data["123"]
{'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}
>>> data["123"]["Points until next bonus"]
' 77'

Basically what we're doing here is "grouping" the data set based on the Customer ID: line. We then create a data structure (a dict) that we can then do O(1) lookups on easily.

Note: As long as your "customer records" in your "dataset" are separated by Customer ID this will work no matter how many "records" a customer has. This implementation also tries to deal with "messy" data too as much as possible by cleaning the input a bit.

That is true; but swapping out ``{}`` for ``OrderedDict`` will! — James Mills, May 23 '15 at 09:57
There you go :) Now we have a generic solution that still preserves the order of customers in the file! — James Mills, May 23 '15 at 10:01
Actually you were right the first time! I misunderstood what you were doing — Padraic Cunningham, May 23 '15 at 10:04
@PadraicCunningham It's okay :) The data structure is ordered now so getting the nth or last customer in the data file in the order it was seen is easy now :) — James Mills, May 23 '15 at 10:06
Actually; re-reading the OP's question and requirements *very carefully* ``OrderedDict`` is not necessary. — James Mills, May 23 '15 at 10:20

score 0 · Answer 3 · edited May 23 '17 at 11:51

0

I would approach this a bit more generally. If I am not mistaken, have a file of record of a specific format, the record starts and ends with **. Why not do the following?

records = file_content.split("**")
for each record in records:
    if (record.split("\n")[0] == search):
        customer_id = getCustomerIdFromRecord(record)
        customer_dictionary.put(customer_id, record)

This will result in a map of customer_id and the latest record. You can parse that to get information you need.

EDIT: Since there are always 9 lines per record, you can get a list of all lines in the file, and create a list of records, where a record will be represented by a list of 9 lines. You can use the answer posted here:

Convert List to a list of tuples python

edited May 23 '17 at 11:51

Community

1
1

answered May 23 '15 at 08:59

Bartlomiej Lewandowski

10,771
14
44
75

Hi, The asterisks were actually auto-added to the post when I tried to make that section appear in Bold text. They don't appear in the .txt file... Thankyou though! – BMSydney May 23 '15 at 09:11
Do the records always have 4 lines? You can use that – Bartlomiej Lewandowski May 23 '15 at 09:16
Yes, they always have the same format - there's actually 9 lines to each, but I edited out the irrelevant information to this question. – BMSydney May 23 '15 at 09:22
Hmm your code sample is kind of syntactically wrong and very incomplete :) – James Mills May 23 '15 at 10:13
It's meant to be a guideline not a ready solution. I'm sure op will manage – Bartlomiej Lewandowski May 23 '15 at 10:25

score 0 · Answer 4 · answered May 23 '15 at 09:45

All you need to do is find the lines that start with Customer ID:123, when you find it loop over the file object in the inner loop until you find the Points until line then extract the points. points will be the last value of the last occurrence of the customer with the id.

with open("test.txt") as f:
    points = ""
    for line in f:
        if line.rstrip() == "Customer ID:123":
            for line in f:
                if line.startswith("Points until"):
                    points = line.rsplit(None, 1)[1]
                    break

print(points)
77

score 0 · Answer 5 · answered May 23 '15 at 17:50

def get_points_until_next_bonus(filename, customerID):
    #get the last "Customer ID":
    last_id = open(filename, 'r').read().split('Customer ID:'+str(customerID))[-1]
    #get the first line with Points until next bonus: 77
    return last_id.split('Points until next bonus: ')[1].split('\n')[0]
    #there you go...

Python - search txt.file for ID, then return variable from line below

5 Answers5

Linked