Matching pattern using Regular expression in Python and output to CSV file

Question

Trying to find pattern from a given .txt file and write the found match to Output CSV.

Request to assist on this

`def output():
    filtered_messages = ['Current date & time ']
    fieldnames = ['currenttime', 'output']
    with open("test1.txt", "r") as f:
        count = 0
        while True:
            count = count +1
            line = f.readline()
            with open(line.rstrip("\n"), "r") as j:
                with open('output.csv', 'w') as csv_file:
                    writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldnames)
                    writer.writerow(dict((fn,fn) for fn in fieldnames))
                    a = 0
                    while True:
                        a = a + 1
                        lines = j.readlines()
                        match = re.search(r'\d{4}\-\d{2}\-\d{2}_\d{2}\-\d{2}\-\d{2}', lines)
                        if match is not None in filtered_messages:
                            writer.writerow(match.groupdict())


output()`

Error:

   match = re.search(str(r'\d{4}\-\d{2}\-\d{2}_\d{2}\-\d{2}\-\d{2}'), lines)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/re.py", line 200, in search
    return _compile(pattern, flags).search(string)
TypeError: expected string or bytes-like object

content of the files will contain a line shown below

sample test1.txt

Current date & time : 2023-07-22_23-02-09

`.readlines()` returns a `list`, but the input parameter for `re.search` needs to be an individual string. So you could map the strings within the list, e.g. using a list comprehension, like: `[re.search(r'\d{4}\-\d{2}\-\d{2}_\d{2}\-\d{2}\-\d{2}', line) for line in lines]`. Or, you could join `lines` (`'\n'.join(lines)`) and feed that as the input to `re.search`. — John Collins, Aug 02 '23 at 10:12

score 0 · Answer 1 · answered Aug 02 '23 at 16:34

I see some confusion in the way you've written your code up this point and I'd like to make some suggestions that might help.

Collect your matches in one place, first

To avoid such deeply nested code/logic I recommend iterating the lines of the input file and finding and saving matches as a first step:

import re

date_pattern = r"\d{4}-\d{2}-\d{2}_\d{2}-\d{2}-\d{2}"
matches: list[re.Match[str]] = []

with open("input.txt") as f:
    for line in f:
        m = re.search(date_pattern, line)
        if m is not None:
            matches.append(m)

Also, you don't need to escape the hyphens in the regex: The following two regexes are equivalent:

r"\d{4}-\d{2}"
r"\d{4}\-\d{2}"

For a good look at hyphens in regexes, I recommend checking out this short answer.

I also like type hints, so I've type-hinted my matches list.

Review your matches (just to debug, understand what the code does)

I mocked up some input text:

Foo
Bar
Current date & time : 2023-07-22_23-02-09
Baz
2012-12-12_13-13-13

To see my matches after running the previous code:

for m in matches:
    print(f"match={m}; match.group()={m.group()}")

# m=<re.Match object; span=(22, 41), match='2023-07-22_23-02-09'>; m.group()=2023-07-22_23-02-09
# m=<re.Match object; span=( 0, 19), match='2012-12-12_13-13-13'>; m.group()=2012-12-12_13-13-13

You tried using Match.groupdict(), which only works if your regex has named subgroups. As is, your regex has no subgroups (so no named subgroups). Instead, you can use Match.group() to get the matched text.

Create your CSV based on matches

Since you don't need (cannot use) groupdict, I don't see the need for csv.DictWriter.

I think the list-based writer will easily give you your desired result (which I actually don't know, but think I have a good-enough idea of):

import csv

with open("output1.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["currenttime", "output"])
    for m in matches:
        writer.writerow([m.group(), "???"])

I think you can just write the header row directly, then use m.group() to create the following rows. That output looks like:

| currenttime         | output |
|---------------------|--------|
| 2023-07-22_23-02-09 | ???    |
| 2012-12-12_13-13-13 | ???    |

I also saw you explicitly making counting variables, count and a. For either the read or write steps, I recommend using Python's enumerate() built-in, maybe something like:

# store the match and the line (int) it was found on
matched_lines: list[tuple[int, re.Match[str]]] = []

with open("input.txt") as f:
    for i, line in enumerate(f, start=1):
        line = re.search(date_pattern, line)
        if line is not None:
            matched_lines.append((i, line))


with open("output2.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(["Original line", "Current time"])
    for ml in matched_lines:
        writer.writerow([ml[0], ml[1].group()])

| Original line | Current time        |
|---------------|---------------------|
| 3             | 2023-07-22_23-02-09 |
| 5             | 2012-12-12_13-13-13 |

@Srinivas, if this answer solves the stated problem, please accept it by clicking the checkmark at the top left of the answer. And, you’re welcome :) — Zach Young, Aug 10 '23 at 04:51

score 0 · Answer 2 · answered Aug 02 '23 at 16:39

You can use the readlines() and iterate through each line,

def output():
filtered_messages = ['Current date & time']
fieldnames = ['currenttime', 'output']
with open("test1.txt", "r") as f:
    lines = f.readlines()
with open('output.csv', 'w') as csv_file:
    writer = csv.DictWriter(csv_file, delimiter=',', fieldnames=fieldnames)
    writer.writerow(dict((fn,fn) for fn in fieldnames))
    for line in lines:
        match = re.search(r'\d{4}\-\d{2}\-\d{2}_\d{2}\-\d{2}\-\d{2}', line)
        if filtered_messages[0] in line:
            writer.writerow({'currenttime':match.group(),'output':'someValue'})
output()

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). — Community, Aug 09 '23 at 03:31

Matching pattern using Regular expression in Python and output to CSV file

2 Answers2

Collect your matches in one place, first

Review your matches (just to debug, understand what the code does)

Create your CSV based on matches