1

I'm trying to extract a 27 char long substring, DpIJr_dR-DNu5kcR9RGmRprcnGU, from the following text with regex

text = '[[\"jewelry_designer\"]\n,[\"watch_store\"]\n,[\"jewelry_appraiser\"]\n,[\"leather_goods_store\"]\n]\n,null,\"DpIJr_dR-DNu5kcR9RGmRprcnGU\",null,null,null,[null]'

So far I isolated strings surrounded by \" with the following

pattern = '\\"(.*?)\\"'
output = re.findall(pattern, text)
### output => ['jewelry_designer', 'watch_store', 'jewelry_appraiser', 'leather_goods_store', 'DpIJr_dR-DNu5kcR9RGmRprcnGU']

My next step is to add a length constraint to my output, so it only matches 27 characters long substrings.

I tried \\"(.*?){27}\\" or \\"(.*?{27})\\" but not successfully. I could do [x for x in output if len(x) == 27], but it would be a shame.

mrzasa
  • 22,895
  • 11
  • 56
  • 94
Sebastien D
  • 4,369
  • 4
  • 18
  • 46

1 Answers1

1

Try this one:

\\\"([^\"]{27})\\\"

Demo

You first match \" with \\\", then match and capture the string you're interested in [^\"]{27} (anything but quote repeated 27 times) and then again \" with \\\"

mrzasa
  • 22,895
  • 11
  • 56
  • 94