0

Im trying to print a txt file using RE to retrieve msg send out from a person

here is what the file looks like

[9/8/18, 10:03:17 PM] Lim :xx xx 
xx 
xx 
xx 
[10/8/18, 11:03:17 PM] Tan:yy yy 
yy 
yy 
yy 
[10/8/18, 12:03:17 PM] Lim :zz zz 
zz 
zz 
zz
[11/8/18, 10:03:17 PM] Ong:11 11
11
11
11

my code is

import re
name = "Lim"
file = open("_chat.txt", "r")
for line in file:
    z = re.findall(r"[\W\d\W\w]",line)
    if z: found = (name in line)
    if found: print (line)

my output is

[9/8/18, 10:03:17 PM] Lim :xx xx 
[10/8/18, 12:03:17 PM] Lim :zz zz 

The result I want is

[14/8/18, 10:03:17 PM] Lim :xx xx 
xx 
xx 
xx 
[14/8/18, 10:03:17 PM] Lim :zz zz 
zz 
zz 
zz

How do I print the next few lines? There may be more than 3 lines.

2 Answers2

0

The easiest solution I can come up with (I'm assuming there is no [ in the messages), is a simple split :

with open('_chat.txt', "r") as f:
    data = f.read()

name = "Lim"
# splitting by '[' and getting every section with messages from someone
for messages in data.split('[')[1:]:
    if name in messages:
        print('[' + messages) # only Lim messages

If you need re you might want to look into the multiline option

DerMaddi
  • 649
  • 2
  • 12
0

Your regular expression [\W\d\W\w] is looking for a single character which is non-whitespace (\W), a number (\d), redundantly, non-whitespace, again, or whitespace (\w). This trivially matches every possible character because every character is either whitespace or non-whitespace. It's not entirely clear what you hoped this would match, but something like

found = False
with open("_chat.txt", "r") as file:
    for line in file:
        z = re.search(r"\[[^][]*\]", line)
        if z:
            found = name in line
        if found:
            print(line)

will pick out lines with a literal pair of square brackets and look for the name only on those lines. Perhaps notice also the use of re.search over re.findall (why do you want to find all when you only care whether or not there is a match?) and the initialization of found before the loop, and the use of a with context manager around open (less crucial here, but a good idiom to learn).

In some more detail, \[ matches a literal square bracket and \] matches a closing one; square brackets without a backslash escape match a single character out of an enumeration of cases, called a character class, so [^][] matches a character which is not (^) either a literal ] or a literal [, and * repeats that regular expression zero or more times.

tripleee
  • 175,061
  • 34
  • 275
  • 318