You have the starter code already.
Recipe to follow as guideline
Let's try to de-compose the problem into parts.
Then we can solve those sub-problems step by step, separately.
(Similar to the problem-solving strategy divide and conquer).
3 parts to solve
I would split into 3 parts, following the IPO model: Input > Process > Output.
(1) read the file and split = parse the records (input)
Inside the loop:
- Try to recognize the field-reading pattern and add the remaining fields similarly:
field = line_data[3]
(for the 4th field).
- Then think of collecting the fields you have read into a record dictionary
dict
(something like name-value pairs for each line.
- After all fields are read and stored in a record, you can add it to the collection of parsed records, like a list you created before: (
list.append(dict)
).
(2) sort and filter the parsed records = de-duplicate (process)
Outside, after the loop:
- Work with the list and try sorting or filtering it to remove the duplicates as required.
(3) format the filtered records (output)
Format the parsed and de-duplicated records back to a string. Then output the string (either print to console or write to a file).
(1) Code explained and prepared to extend
# WHAT YOU ALREADY HAVE, EXPLAINED WITH COMMENTS
with open("sample.txt", mode="r") as f: # open file ("sample.txt") to read ('r') by using handle f
text = f.readlines() # read all lines into the list named text
# IDEA: create an empty list to collect named records
records = []
for line in text: # iterate through each of the lines in text
line_data = line.split(":") # split each line by delimiter ":" into a list of tuples (fields or columns)
name = line_data[0].strip() # trim the 1. field or column (containing the name) to remove spaces around
# HERE YOU CAN BUILD ON
# read the other fields
course = line_data[1].strip() # 2. field trimmed
# same with 3. and 4. field
# IDEA: dict with named records you need later for filtering
record = {'name': name, 'course': course, 'time': '00:00', 'points': 0}
records.append(record) # add the parsed record to the collection
Consider putting your existing parsing logic (1) into a function:
# (1) reading file and parsing records
def parse_records():
# add your existing code here
return records
Then you can easily add new functions (2 and 3):
# your main script starts here calling sub-routines (functions)
if __name__ == '__main__':
records = parse_records() # your first part solved (1)
print(records) # debug output to see if parsing
# now solve the sorting and filtering (2)
# then print out the filtered records as formatted string (3)
(2) Sorting/Filtering
Find duplicates and sort/filter them for earliest time.
In pseudo-code (using unimplemented functions):
def filter_duplicates_by_name(list):
previous = None
for record in sort_by_name(list):
if previous != None and previous['name'] == record['name']:
print("name duplicate found: " + str(record))
# if not already sorted by time, compare times and
# either (a) put earlier to result instead later
# or (b) remove later from list (filtered)
else:
# no duplicate, or not yet: consider adding it to result
previous = record
# return either (a) result or (b) filtered list
return list
Implement the function sort_by_name(list)
.
It should sort the list by name (and maybe also time next) and return the sorted result.
def sort_by_name(list):
sorted = []
# sort the list, e.g. using for-loop and if-else
return sorted
Then you can use it to output the filtered records:
filtered = filter_duplicates_by_name(records)
for record in filtered:
print(filtered)
# or formatted back to colon-separated values
(3) Formatting parsed records as string
You may recognize join
as the counter-part to split
.
Python basics & tutorials applicable here
Data structures:
Control flow:
String formatting:
Tutorials for sorting: