0

I want to apply regex for every newline in my txt file. For example

comments={ts=2010-02-09T04:05:20.777+0000,comment_id=529590|2886|LOL|Baoping Wu|529360}
comments={ts=2010-02-09T04:20:53.281+0000, comment_id=529589|2886|cool|Baoping Wu|529360}
comments={ts=2010-02-09T05:19:19.802+0000,comment_id=529591|2886|ok|Baoping Wu|529360}

My Python Code is:

import re
p = re.compile(ur'(comment_id=)(\d+)\|(\d+)\|([^|]+)\|([^|]+)\|(\d+)', re.MULTILINE|re.DOTALL)
#open =
test_str = r"comments={ts=2010-02-09T04:05:20.777+0000, comment_id=529590|2886|LOL|Baoping Wu|529360}"
subst = ur"\1\2, user_id = \3, comment='\4', user= '\5', post_commented=\6"

result = re.sub(p, subst, test_str)
print result

I want to solve it with help of MULTILINE, but it doesnt Work. Can anyone help me

The Output for the first line should be

comments={ts=2010-02-09T04:05:20.777+0000, comment_id=529590, user_id = 2886, comment='LOL', user= 'Baoping Wu', post_commented=529360}

My issue is only to apply the regex for every line and write it on txt file.

1 Answers1

1

Your regex works without having to use MULTILINE or DOTALL. You can replace through the entire document at once. In action

import re

with open('file.txt', 'r') as f:
    txt = f.read()

pattern = r'(comment_id=)(\d+)\|(\d+)\|([^|]+)\|([^|]+)\|(\d+)'
repl = r"\1\2, user_id = \3, comment='\4', user= '\5', post_commented=\6"

result = re.sub(pattern, repl, txt)
with open('file2.txt', 'w') as f:
    f.write(result)
Brendan Abel
  • 35,343
  • 14
  • 88
  • 118
  • I had tried it out, but it doesnt apply the replacemts. It is the same file – Okan Albayrak Feb 24 '16 at 19:19
  • Hmm, works for me. Did you notice that it's writing out to a separate file? – Brendan Abel Feb 24 '16 at 19:22
  • Ah, Ok...one Moment. Should I create a file with the name file2.txt? – Okan Albayrak Feb 24 '16 at 19:27
  • Which python version did you use? The compiler gave me an error on the line with the declaration of the variable "pattern". He said invalid syntax with the symbol ' – Okan Albayrak Feb 24 '16 at 19:40
  • python27. Try without the `u`, it shouldn't be needed here. – Brendan Abel Feb 24 '16 at 19:45
  • IT WORKSSSSSSSS:))))))))). Thank you very much. But why without the u? What is the reason – Okan Albayrak Feb 24 '16 at 19:49
  • 1
    @OkanAlbayrak: See [*What exactly do “u” and “r” string flags do in Python, and what are raw string literals?*](http://stackoverflow.com/questions/2081640/what-exactly-do-u-and-r-string-flags-do-in-python-and-what-are-raw-string-l). `u` is necessary if you work with extended characters, from all of the Unicode table. Glad to hear it finally works. – Wiktor Stribiżew Feb 24 '16 at 21:07