1

I would like to extract all 2-letter strings from a text with a regular expression, for example :

just a test

would get me ju, us, st, te, es, st

I tried using : re.findall(r'\w{2}',text) but it only divides the words into 2-letter strings and gives me ju, st, te, st

Thank you very much in advance for your help.

memegame
  • 21
  • 1
  • 2

1 Answers1

2

I'll leave the regex solutions to regex experts (which I'm not), since it can be done without regex quite simply in a one-liner list comprehension:

s = "just a test"
result = ["".join(x)  for w in s.split() if len(w)>1 for x in zip(w,w[1:])]

print(result)

result:

['ju', 'us', 'st', 'te', 'es', 'st']

Just split the words, filtering out words with less than 2 characters, and interleave them against their shifted copy using zip

only works if there's no punctuation of course.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219