3

In Python, I'm attempting (very poorly) to read a .txt file, find the last occurrence of a string referencing a particular customer, and read a few lines below that to gain their current points balance.

A snapshot of the .txt file is:

Customer ID:123
Total sale amount:2345.45

Points from sale:23
Points until next bonus: 77

I can search for (and find) the specific Customer ID, but can't figure out how to search for the last occurrence of this ID only, or how to return the 'Points until next bonus' value... I apologise if this is a basic question, but any help would be greatly appreciated!

My code so far...

def reward_points():

#current points total
rewards = open('sales.txt', 'r')

line = rewards.readlines()
search = (str('Customer ID:') + str(Cust_ID))
print(search) #Customer ID:123

while line != ' ':
    if line.startswith(search):
        find('Points until next bonus:')
        current_point_total = line[50:52]
        cust_record = rewards.readlines()
        print(current_point_total)


rewards.close()

reward_points()

BMSydney
  • 31
  • 1
  • 4

5 Answers5

2

I think you'd be better off parsing the file into structured data, rather than trying to seek around the file, which isn't in a particularly convenient file format.

Here's a suggested approach

Iterate over the file with readline

Split the line into fields and labels by matching on ':'

Put the fields and labels representing a customer into a dict

Put the dict representing a customer into another dict

You then have an in-memory database , that you can dereference by dict lookups

e.g customers['1234']['Points until next bonus']

Here's a simplified example code of this approach

#!/usr/bin/env python
import re

# dictionary with all the customers in 
customers = dict()

with open("sales.txt") as f:
    #one line at a time
    for line in f:
        #pattern match on 'key : value'
        field_match = re.match('^(.*):(.*)$',line)

        if field_match :
            # store the fields in variables
            (key,value) = field_match.groups()
            # Customer ID means a new record
            if key == "Customer ID" :
                # set a key for the 'customers database'
                current_id = value
                # if we have never seen this id before it's the first, make a record
                if customers.get(current_id) == None :
                    customers[current_id] = []
                # make the record an ordered list of dicts for each block
                customers[current_id].append(dict())
            # not a new record, so store the key and value in the dictionary at the end of the list
            customers[current_id][-1][key] = value

# now customers is a "database" indexed on customer id
#  where the values are a list of dicts of each data block
#
# -1 indexes the last of the list
# so the last customer's record for "123" is 

print customers["123"][-1]["Points until next bonus"]

Updated

I didn't realise you had multiple blocks for customers, and were interested in the ordering, so I reworked the sample code to keep an ordered list of each data block parsed against customer id

cms
  • 5,864
  • 2
  • 28
  • 31
  • It's possible to even improve this answer with [defaultdict](https://docs.python.org/2/library/collections.html#defaultdict-examples). – sobolevn May 23 '15 at 10:17
  • @sobolevn I'm trying to keep the code as close to simple pseudocode as I can, because I think the OP isn't very fluent in python. Although now the list indices are a bit obscuring :-/ – cms May 23 '15 at 10:20
1

This is a good use-case for itertools.groupby() and this use-case fits that pattern rather well:

Example:

from itertools import groupby, ifilter, imap


def search(d):
    """Key function used to group our dataset"""

    return d[0] == "Customer ID"


def read_customer_records(filename):
    """Read customer records and return a nicer data structure"""

    data = {}

    with open(filename, "r") as f:
        # clean adn remove blank lines
        lines = ifilter(None, imap(str.strip, f))

        # split each line on the ':' token
        lines = (line.split(":", 1) for line in lines)

        # iterate through each customer and their records
        for newcustomer, records in groupby(lines, search):
            if newcustomer:
                # we've found a new customer
                # create a new dict against their customer id
                customer_id = list(records)[0][1]
                data[customer_id] = {}
            else:
                # we've found customer records
                # add each key/value pair (split from ';')
                # to the customer record from above
                for k, v in records:
                    data[customer_id][k] = v

    return data

Output:

>>> read_customer_records("foo.txt")
{'123': {'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}, '124': {'Total sale amount': '245.45', 'Points until next bonus': ' 79', 'Points from sale': '27'}}

You can then lookup customers directly; for example:

>>> data = read_customer_records("foo.txt")
>>> data["123"]
{'Total sale amount': '2345.45', 'Points until next bonus': ' 77', 'Points from sale': '23'}
>>> data["123"]["Points until next bonus"]
' 77'

Basically what we're doing here is "grouping" the data set based on the Customer ID: line. We then create a data structure (a dict) that we can then do O(1) lookups on easily.

Note: As long as your "customer records" in your "dataset" are separated by Customer ID this will work no matter how many "records" a customer has. This implementation also tries to deal with "messy" data too as much as possible by cleaning the input a bit.

James Mills
  • 18,669
  • 3
  • 49
  • 62
0

I would approach this a bit more generally. If I am not mistaken, have a file of record of a specific format, the record starts and ends with **. Why not do the following?

records = file_content.split("**")
for each record in records:
    if (record.split("\n")[0] == search):
        customer_id = getCustomerIdFromRecord(record)
        customer_dictionary.put(customer_id, record)

This will result in a map of customer_id and the latest record. You can parse that to get information you need.

EDIT: Since there are always 9 lines per record, you can get a list of all lines in the file, and create a list of records, where a record will be represented by a list of 9 lines. You can use the answer posted here:

Convert List to a list of tuples python

Community
  • 1
  • 1
Bartlomiej Lewandowski
  • 10,771
  • 14
  • 44
  • 75
0

All you need to do is find the lines that start with Customer ID:123, when you find it loop over the file object in the inner loop until you find the Points until line then extract the points. points will be the last value of the last occurrence of the customer with the id.

with open("test.txt") as f:
    points = ""
    for line in f:
        if line.rstrip() == "Customer ID:123":
            for line in f:
                if line.startswith("Points until"):
                    points = line.rsplit(None, 1)[1]
                    break

print(points)
77
Padraic Cunningham
  • 176,452
  • 29
  • 245
  • 321
0
def get_points_until_next_bonus(filename, customerID):
    #get the last "Customer ID":
    last_id = open(filename, 'r').read().split('Customer ID:'+str(customerID))[-1]
    #get the first line with Points until next bonus: 77
    return last_id.split('Points until next bonus: ')[1].split('\n')[0]
    #there you go...
Dror Hilman
  • 6,837
  • 9
  • 39
  • 56