Reorganize txt file - Identify date string, relocate and insert string

Question

I have got an input txt file formatted like this:

The output should look like this:

27/04/2023 00:00 0.1
27/04/2023 06:00 0.5
27/04/2023 23:00 0.9
28/04/2023 00:00 0.1
28/04/2023 06:00 0.5
28/04/2023 23:00 0.9
29/04/2023 00:00 0.1
29/04/2023 06:00 0.5
29/04/2023 23:00 0.9

What is the most straigth forward and pythonic way to reformat the file?

What I am doing now:

Read the file line by line.
Check if line is a date and add its line number to a list and the date to another list
From the line numbers list get pairs of consecutive date line numbers
Slice the file content between concecutive date line numbers and insert the corresponding date.

The code is kind of clumpy. And it is not going to read and reformat the last day in the file...

from datetime import datetime

data_file = 'data.txt'
dates = []
dates_line_number = []

with open(data_file) as input_file:

    for i, line in enumerate(input_file):

        # read only the lines with dates, store their line number to list
        # store the date to another list
        
            try:
            date_object = datetime.strptime(line.strip(), '%d/%m/%Y')

            dates.append(date_object)
            dates_line_number.append(i)

            del date_object
        except:
            pass

file = open(data_file)
content = file.readlines()

i = 0
f = open("outfile.txt", "w")

for index in range(len(dates_line_number)):

    # get pairs of consecutive date line numbers
    ls_index = dates_line_number[index:index+2]

    if len(ls_index) == 2:
        start = ls_index[0]+1
        end = ls_index[1]-1

        # slice the file content between concecutive date line numbers
        ls_out = (content[start:end+1])

        # insert corresponding date string
        str_date = f"{dates[i].strftime('%d/%m/%Y')} "
        ls_out.insert(0, '')

        str_out = str_date.join(ls_out)

        f.write(str_out)
        i = i+1

f.close()

Do you know if your input file is always going to be consistently organized as it is now? You have a date followed by three times, with this repeating. If so you could do things a lot less dynamically. Obviously not a robust solution but sometimes the simplest solution is best. Just a thought. — UnsanitizedInput, May 01 '23 at 12:06
Just use a "holder" variable to hold the date, once new date is encountered, reformat and dump the lines into another file + put the new date in your holder var. For last block, trigger dumping when you're on last element of readlines() list as well. — Zircoz, May 01 '23 at 12:06
Or write line by line: every time `for i, line in enumerate(input_file):` gives you a line that isn't a date, concatenate it with the last date you encountered, and write that single resulting line to the output. Then no need for a special step at the end. — slothrop, May 01 '23 at 12:11
@UnsanitizedInput Data file is at an unknown time interval, so lines between dates are variable but consistant throughout the file. — PaulR, May 01 '23 at 12:52

score 3 · Answer 1 · answered May 01 '23 at 12:23

this will check each line for a date (dd/mm/yyyy) and if found, use it as a prefix for the following lines...until another date is found...

import re

data = """27/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
28/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
29/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
"""

date = ""
for l in data.splitlines():
    if re.match(r'^\d{2}/\d{2}/\d{4}$', l):
        date = l
        continue
    print(date.strip(), l.strip())

output:

27/04/2023 00:00 0.1
27/04/2023 06:00 0.5
27/04/2023 23:00 0.9
28/04/2023 00:00 0.1
28/04/2023 06:00 0.5
28/04/2023 23:00 0.9
29/04/2023 00:00 0.1
29/04/2023 06:00 0.5
29/04/2023 23:00 0.9

score 0 · Answer 2 · answered May 01 '23 at 12:18

First I have to say that always use with to open files. SO you do not need to close the file explicitly.

Your goal can be achieved by the code below:

with open('data.txt', 'r') as f:
    all_lines = (f.read().splitlines())

with open('outfile.txt', 'w') as f:
    for i, line in enumerate(all_lines):
        if i % 4 == 0:
            date = line
        else:
            f.write(f'{date} {line}\n')

I assumed that each date is followed by three other lines. If you can have more than three lines you can replace the if i % 4 == 0: condition with another one which can tell if it is a valid date. It can be achieved by regex or functions.

The code above produces exactly the output you want.

Thanks your answer. I should have mentioned that the input file can have an unknown / variable number of lines between the dates but it is always at a regular time interval, like 1 hour or 10 minutes. It is not always just 3 lines between the dates. — PaulR, May 01 '23 at 12:26

Beavatron Prime · Answer 3 · 2023-05-01T13:48:24.470

Apologies, I have never written in python before so excuse the mess.

Assuming there's always 3 times per date I would write the following:

Replace newline that doesn't start with a date with a marker (say ---) this will give you a solid line with the date at the start and all the times preceding it.

Replace capturing date at the start of the line replacing all (---) markers on the line with the date at the beginning of the line while re-entering newlines where appropriate.

Below is a quick working example I wrote that can be tested in a python environment such as https://lwebapp.com/en/python-playground

import re
txt = "27/04/2023\n00:00 0.1\n06:00 0.5\n23:00 0.9\n28/04/2023\n00:00 0.1\n06:00 0.5\n23:00 0.9\n29/04/2023\n00:00 0.1\n06:00 0.5\n23:00 0.9"
print(txt)
y = re.sub(r"\n([0-9][0-9]\:)", r"---\1", txt)
y = re.sub(r"([0-9][0-9]\/[0-9][0-9]\/[0-9][0-9][0-9][0-9])---(.*?)---(.*?)---(.*?)", r"\1 \2\n\1 \3\n\1 \4", y)
print(y)

score 0 · Answer 4 · answered May 01 '23 at 14:58

Thanks all for the input. I wanted to explicitly check the date to take into account all possible date formattings. Refer to: Check if string has date, any format

Final solution below:

from dateutil.parser import parse


data = """27/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
28/04/2023
00:00 0.1
06:00 0.5
23:00 0.9
29.04.2023
00:00 0.1
06:00 0.5
23:00 0.9
"""


def is_date(string, fuzzy=False):
    """
    Return whether the string can be interpreted as a date.

    :param string: str, string to check for date
    :param fuzzy: bool, ignore unknown tokens in string if True
    """
    try:
        parse(string, fuzzy=fuzzy)
        return True

    except ValueError:
        return False


#file = open('data.txt')
#data = file.read()
date = ""

for line in data.splitlines():

    if is_date(line):
        date = line
        continue
    print(date.strip(), line.strip())

Reorganize txt file - Identify date string, relocate and insert string

4 Answers4