1

I'm trying to find all instances of a specific substring(a!b2 as an example) and return them with the 4 characters that follow after the substring match. These 4 following characters are always dynamic and can be any letter/digit/symbol.

I've tried searching, but it seems like the similar questions that are asked are requesting help with certain characters that can easily split a substring, but since the characters I'm looking for are dynamic, I'm not sure how to write the regex.

Will Wild
  • 137
  • 8

3 Answers3

1

When using regex, you can use "." to dynamically match any character. Use {number} to specify how many characters to match, and use parentheses as in (.{number}) to specify that the match should be captured for later use.

>>> import re
>>> s = "a!b2foobar a!b2bazqux a!b2spam and eggs"
>>> print(re.findall("a!b2(.{4})", s))
['foob', 'bazq', 'spam']
Kevin
  • 74,910
  • 12
  • 133
  • 166
0

If you're only looking for how to grab the following 4 characters using Regex, what you are probably looking to use is the curly brace indicator for quantity to match: '{}'.

They go into more detail in the post here, but essentially you would do [a-Z][0-9]{X,Y} or (.{X,Y}), where X to Y is the number of characters you're looking for (in your case, you would only need {4}).


A more Pythonic way to solve this problem would be to make use of string slicing, and the index function however.

Eg. given an input_string, when you find the substring at index i using index, then you could use input_string[i+len(sub_str):i+len(sub_str)+4] to grab those special characters.

As an example,

input_string = 'abcdefg'
sub_str = 'abcd'
found_index = input_string.index(sub_str)
start_index = found_index + len(sub_str)
symbol = input_string[start_index: start_index + 4]

Outputs (to show it works with <4 as well): efg

Index also allows you to give start and end indexes for the search, so you could also use it in a loop if you wanted to find it for every sub string, with the start of the search index being the previous found index + 1.

Enthus3d
  • 1,727
  • 12
  • 26
0
import re
print (re.search(r'a!b2(.{4})')).group(1))

.{4} matches any 4 characters except special characters. group(0) is the complete match of the searched string. You can read about group id here.

Abhyudai
  • 826
  • 7
  • 16