1

My Regex-fu is seriously lacking and I can't get my head round it... any help greatly received.

I am looking for a Python way to parse a string that a knarly old piece of software (that I dont have source access to) spits out:

,Areas for further improvement,The school’s leaders are rightly seeking to improve the following areas:,,========2========,,3/5,Continue to focus on increasing performance at the higher levelsPupils’,literacy and numeracy skills across the curriculumStandards,in science throughout the schoolPupils’,numerical reasoning skills

What I want to do is:

(1) Remove all the existing , : = / characters to form a single contiguous string:

Areas for further improvementThe school’s leaders are rightly seeking to improve the following areas23/5Continue to focus on increasing performance at the higher levelsPupils’literacy and numeracy skills across the curriculumStandardsin science throughout the schoolPupils’numerical reasoning skills

Then preceed each capital letter with a single , to allow me then to use the string as a sensible csv input....

,Areas for further improvement,The school’s leaders are rightly seeking to improve the following areas23/5,Continue to focus on increasing performance at the higher levels,Pupils’literacy and numeracy skills across the curriculum,Standardsin science throughout the school,Pupils’numerical reasoning skills

I appreciate this will give me a preceeding , but I can strip that out when I write to file.

Is this possible via a re.sub() and regex-fu?

(Happy for this to be a two step process - remove existing junk characters and then add in , preceeding capital letters)

Can someone save my regex sanity please?

Cheers

FHTMitchell
  • 11,793
  • 2
  • 35
  • 47
NimueSTEM
  • 193
  • 2
  • 13

2 Answers2

3
re.sub(r'([A-Z])', r',\1', re.sub(r'[,:=/]', '', input_))

output:

',Areas for further improvement,The school’s leaders are rightly seeking to improve the following areas235,Continue to focus on increasing performance at the higher levels,Pupils’literacy and numeracy skills across the curriculum,Standardsin science throughout the school,Pupils’numerical reasoning skills'
FHTMitchell
  • 11,793
  • 2
  • 35
  • 47
  • Worked a treat and I undersand the nested re.sub -- do people just know the regex stuff or is there a secret sauce somewhere on the Internet to build the expressiomns? – NimueSTEM Jun 21 '18 at 15:40
  • You can practise [here](https://regex101.com/r/ohJ6IR/2). The nested part isn't too hard. It's just like `b = re.sub(e1, r1, a); c = re.sub(b, e2, r2, b)`. – FHTMitchell Jun 21 '18 at 15:43
  • whoops, that should be `re.sub(e2, r2, b)` – FHTMitchell Jun 21 '18 at 15:49
1

You can apply re.sub twice:

import re
s = ',Areas for further improvement,The school’s leaders are rightly seeking to improve the following areas:,,========2========,,3/5,Continue to focus on increasing performance at the higher levelsPupils’,literacy and numeracy skills across the curriculumStandards,in science throughout the schoolPupils’,numerical reasoning skills'
new_s = re.sub('[A-Z]', lambda x:f',{x.group()}', re.sub('[,:\=]+', '', s))

Output:

',Areas for further improvement,The school’s leaders are rightly seeking to improve the following areas23/5,Continue to focus on increasing performance at the higher levels,Pupils’literacy and numeracy skills across the curriculum,Standardsin science throughout the school,Pupils’numerical reasoning skills'
Ajax1234
  • 69,937
  • 8
  • 61
  • 102