0

A text file "input_msg.txt" file contains follwing records..

Jan 1 02:32:40 other strings but may or may not unique in all those lines
Jan 1 02:32:40 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Jan 1 02:46:40 other strings but may or may not unique in all those lines
Jan 1 02:44:40 other strings but may or may not unique in all those lines
Jan 1 02:40:40 other strings but may or may not unique in all those lines
Feb 10 03:52:26 other strings but may or may not unique in all those lines

I have tried the following program.

def sort_file_based_timestap():    
   f = open(r"D:\Python34\test_msg.txt", "r")    
   xs = f.readlines()     
   xs.sort()  
   print (xs)
   f.close()

This program is sorting based on string.

I need the output like below.

Jan 1 02:32:40 other strings but may or may not unique in all those lines
Jan 1 02:32:40 other strings but may or may not unique in all those lines
Jan 1 02:40:40 other strings but may or may not unique in all those lines
Jan 1 02:44:40 other strings but may or may not unique in all those lines
Jan 1 02:46:40 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Feb 1 03:52:26 other strings but may or may not unique in all those lines
Feb 10 03:52:26 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:55 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:56 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines
Mar 31 23:31:57 other strings but may or may not unique in all those lines

Your help would be appreciated!!!

HelloR
  • 49
  • 1
  • 1
  • 9

3 Answers3

2

The trick is to first annotate each line with a python-readable timestamp and then sorting this list of annotated lines.

I have put some sample code below:

import time
import re

def parse_line(line):
    """
    Parses each line to split line into the timestamp and the rest
    """

    line = line.rstrip()
    m = re.match(r"(\w{3}\s+\d+\s+[0-9:]+)\s+(.*)", line)
    if m:
        timestamp = time.strptime(m.group(1), "%b %d %H:%M:%S")
        return (timestamp, line)


def main():
    f = open('input_msg.txt', 'r')
    lines = []
    for line in f:
        parsed = parse_line(line)
        if parsed:
            lines.append(parsed)
    # sort the array based on the first element of each tuple
    # which is the parsed time
    sorted_lines  = sorted(lines, key=lambda annotated_line: annotated_line[0])
    for l in sorted_lines:
        print l[1]

if __name__ == "__main__":
    main()
GorillaPatch
  • 5,007
  • 1
  • 39
  • 56
  • Hi Stefan Can you please explain me in this [QUESTION](https://stackoverflow.com/questions/62702224/python-memory-error-when-reading-large-files-need-ideas-to-apply-mutiprocessin) – Ankur Jul 03 '20 at 04:52
0

Use a (month, day, rest) triple as sorting key, with month and day properly parsed and thus comparing correctly.

import time
def dater(line):
    month, day, rest = line.split(' ', 2)
    return (time.strptime(month, '%b'), int(day), rest)

with open('input_msg.txt') as file:
    for line in sorted(file, key=dater):
        print(line, end='')
Stefan Pochmann
  • 27,593
  • 8
  • 44
  • 107
  • Hi Stefan Can you please explain me in this [QUESTION](https://stackoverflow.com/questions/62702224/python-memory-error-when-reading-large-files-need-ideas-to-apply-mutiprocessin) – Ankur Jul 03 '20 at 04:50
0

How about this?

You first take the text and convert it into a list using splitlines() Now, each entry of this list is a string. We can't sort these chunks of strings. So, next, you take the strings and convert them into list using split() Now, your log file has been converted into a list of lists You can now parse this "list of lists" using a custom key function.

Here is the code to do so -

# log text
log = """Jan 1 02:32:40 other strings but may or may not unique in all those lines
    Jan 1 02:32:40 other strings but may or may not unique in all those lines
    Mar 31 23:31:55 other strings but may or may not unique in all those lines
    Mar 31 23:31:55 other strings but may or may not unique in all those lines
    Mar 31 23:31:55 other strings but may or may not unique in all those lines
    Mar 31 23:31:56 other strings but may or may not unique in all those lines
    Mar 31 23:31:56 other strings but may or may not unique in all those lines
    Mar 31 23:31:56 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Mar 31 23:31:57 other strings but may or may not unique in all those lines
    Feb 1 03:52:26 other strings but may or may not unique in all those lines
    Feb 1 03:52:26 other strings but may or may not unique in all those lines
    Jan 1 02:46:40 other strings but may or may not unique in all those lines
    Jan 1 02:44:40 other strings but may or may not unique in all those lines
    Jan 1 02:40:40 other strings but may or may not unique in all those lines
    Feb 10 03:52:26 other strings but may or may not unique in all those lines"""

# convert the log into a list of strings
lines = log.splitlines()
'''initialize temp list that will store the log as a "list of lists" which can be sorted easily'''
temp_list = []
for data in lines:
    temp_list.append(data.split())


# writing the method which will be fed as a key for sorting
def convert_time(logline):
    # extracting hour, minute and second from each log entry
    h, m, s = map(int, logline[2].split(':'))
    time_in_seconds = h * 3600 + m * 60 + s
    return time_in_seconds


sorted_log_list = sorted(temp_list, key=convert_time)

''' sorted_log_list is a "list of lists". Each list within it is a representation of one log entry. We will use print and join to print it out as a readable log entry'''
for lines in sorted_log_list:
    print " ".join(lines)

Here is a more efficient version of the above code, here we don't need to create a temp_list and simply write a function that works on the strings that are generated as a result of splitlines() -

# convert the log into a list of strings
lines = log.splitlines()

# writing the method which will be fed as a key for sorting
def convert_time(logline):
    # extracting hour, minute and second from each log entry
    h, m, s = map(int, logline.split()[2].split(':'))
    time_in_seconds = h * 3600 + m * 60 + s
    return time_in_seconds


sorted_log_list = sorted(lines, key=convert_time)

''' sorted_log_list is a "list of lists". Each list within it is a representation of one log entry. We will use print and join to print it out as a readable log entry'''
for lines in sorted_log_list:
    print lines
user168115
  • 11
  • 2