Questions tagged [python-re]

Python library that provides regular expression matching operations similar to those found in Perl.

re is the Python built-in module to deal with regular-expressions. It offers an intuitive, high-level mechanism to match patterns on strings.

The main functions to use from this module are:

  • re.compile - this function takes a pattern and some possible flags and returns a Pattern object. This is mostly useful when using the same pattern in a loop - compile the pattern once before the loop, instead of at each iteration.

  • re.match - takes a pattern and a string (and possible flags) and tries to match the pattern from the beginning of the string. Returns a Match object.

  • re.search - similar to match, but searches anywhere in the string.

  • re.findall - similar to search, but returns a list with all matches found. The list contains strings rather than Match objects. When the pattern contains groups, the list will consist of tuples containing the groups of each match.

The re module also offers a regex-equivalent replacements for the built-in split - re.split - and replace - re.sub.

1981 questions
176
votes
4 answers

The result list contains single spaces when splitting a string with re.split("( )+") – is there a better way?

I have the output of a command in tabular form. I'm parsing this output from a result file and storing it in a string. Each element in one row is separated by one or more whitespace characters, thus I'm using regular expressions to match 1 or more…
gjois
  • 2,025
  • 3
  • 18
  • 19
86
votes
4 answers

re.sub replace with matched content

Trying to get to grips with regular expressions in Python, I'm trying to output some HTML highlighted in part of a URL. My input is images/:id/size my output should be images/:id/size If I do this in Javascript method =…
Blank
  • 4,635
  • 5
  • 33
  • 53
61
votes
3 answers

Using more than one flag in python re.findall

I would like to use more than one flag with the re.findall function. More specifically, I would like to use the IGNORECASE and DOTALL flags at the same time. x = re.findall(r'CAT.+?END', 'Cat \n eND', (re.I, re.DOTALL)) Error : Traceback (most…
Pavan
  • 2,715
  • 4
  • 18
  • 19
55
votes
5 answers

How to match a newline character in a raw string?

I got a little confused about Python raw string. I know that if we use raw string, then it will treat '\' as a normal backslash (ex. r'\n' would be \ and n). However, I was wondering what if I want to match a new line character in raw string. I…
wei
  • 3,312
  • 4
  • 23
  • 33
38
votes
6 answers

re.findall not returning full match?

I have a file that includes a bunch of strings like "size=XXX;". I am trying Python's re module for the first time and am a bit mystified by the following behavior: if I use a pipe for 'or' in a regular expression, I only see that bit of the match…
Ben S.
  • 3,415
  • 7
  • 22
  • 43
25
votes
1 answer

Getting PEP8 "invalid escape sequence" warning trying to escape parentheses in a regex

I am trying to escape a string such as this: string = re.split(")(", other_string) Because not escaping those parentheses gives me an error. But if I do this: string = re.split("\)\(", other_string) I get a warning from PEP8 that it's an invalid…
Jack Avante
  • 1,405
  • 1
  • 15
  • 32
18
votes
4 answers

How to grab number after word in python

I have a huge file containing the following lines DDD-1126N|refseq:NP_285726|uniprotkb:P00112 and DDD-1081N|uniprotkb:P12121, I want to grab the number after uniprotkb. Here's my code: x = 'uniprotkb:P' f = open('m.txt') for line in f: print…
graph
  • 389
  • 2
  • 5
  • 10
16
votes
2 answers

re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7

On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .* gives the input string repeated twice! On Python 3.7.2: >>> import re >>> re.sub(".*", "(replacement)", "sample text") '(replacement)(replacement)' On Python…
Laurent LAPORTE
  • 21,958
  • 6
  • 58
  • 103
12
votes
1 answer

Check whether modification in re.sub occurred

The Python function re.sub(pattern, replacement, string) returns the modified string with the matched pattern replaced with the replacement. Is there any easy way to check whether a match has occurred and a modification has been made? (And also, how…
Mika H.
  • 4,119
  • 11
  • 44
  • 60
10
votes
1 answer

Combine re flags re.IGNORECASE, re.MULTILINE and re.DOTALL

Can anyone tell me if I can combine flags like re.IGNORECASE, re.MULTILINE and re.DOTALL for regular expression matching? r = re.compile(regex, re.IGNORECASE | re.MULTILINE | re.DOTALL) I need to match an entire paragraph or an expression in one…
Elhabib
  • 141
  • 2
  • 10
7
votes
4 answers

Only keep df column values that contain a string from list of string

I Have a list of strings like this: stringlist = [JAN, jan, FEB, feb, mar] And I have a dataframe that looks like this: **date** **value** 01MAR16 1 05FEB16 12 10jan17 5 10mar15 …
ljourney
  • 515
  • 4
  • 11
7
votes
3 answers

Unable to fetch all the links from a webpage using requests

I'm trying to get all the links connected to each image in this webpage. I can get all the links if I let a selenium script scroll downward until it reaches the bottom. One such link that I wish to scrape is this one. Now, my goal here is to parse…
robots.txt
  • 96
  • 2
  • 10
  • 36
7
votes
1 answer

How to add a timeout to a Python 3 RegEx?

I have a regex that might take a long time to execute, despite my best efforts at optimization. I want to be able to interrupt it in the cases where it stalls, and proceed with the rest of the program Other languages like C# have a Timeout property…
robob
  • 1,739
  • 4
  • 26
  • 44
6
votes
2 answers

How to check if named capture group exists?

I'm wondering what is the proper way to test if a named capture group exists. Specifically, I have a function that takes a compiled regex as an argument. The regex may or may not have a specific named group, and the named group may or may not be…
HardcoreHenry
  • 5,909
  • 2
  • 19
  • 44
5
votes
4 answers

combining split with findall

I'm splitting a string with some separator, but want the separator matches as well: import re s = "oren;moish30.4.200/-/v6.99.5/barbi" print(re.split("\d+\.\d+\.\d+", s)) print(re.findall("\d+\.\d+\.\d+", s)) I can't find an easy way to combine…
OrenIshShalom
  • 5,974
  • 9
  • 37
  • 87
1
2 3
99 100