0

assume I have this string

text='bla1;\nbla2;\nbla3;\n#endif\nbla4;'

I want to define a method which removes all '\n' except if a '\n' is preceded by a string starting with '#' or '\n' follows '#', so the result of the process should be:

text2='bla1;bla2;bla3;\n#endif\nbla4;'

Is there a simple way to do this in python using regex?

(Note: is it clear to me how to avoid \n followed by #, using negative lookbehind, i.e. r'\n+(?!#)' but the challenge is how to identify a \n preceded by a string starting with #)

The challenge is: how to deal with a positive lookbehind with variable-length strings in python?

Mannaggia
  • 4,559
  • 12
  • 34
  • 47
  • So if an `\n` is preceded by a `#` then it should be left untouched. In your example also the following `\n` after `endif` is left. Is this intentional? – msvalkon Apr 22 '14 at 09:49
  • Yes it is intentional (as described in the question), but I mainly care about the \n proceeded by a word starting with # – Mannaggia Apr 22 '14 at 09:55
  • why is the challenge btw ? where have you used positive lookbehind ? – aelor Apr 22 '14 at 10:23
  • the moment I say that \n should not come after an arbitrary length string starting with #. I think this shows the problem: http://stackoverflow.com/questions/11640447/regexps-variable-length-lookbehind-assertion-alternatives – Mannaggia Apr 22 '14 at 10:32

1 Answers1

1

find : (#[a-z]+\\n)|\\n(?!#)

and replace with : '\1'

output : bla1;bla2;bla3;\n#endif\nbla4;

demo here : http://regex101.com/r/uO8wH2

This will keep all the \n newline chars which have a preceding word starting with # or are followed by a hash.

HTH

aelor
  • 10,892
  • 3
  • 32
  • 48
  • Thx, but I am not sure this works in python, see http://regex101.com/r/uO8wH2#python – Mannaggia Apr 22 '14 at 10:03
  • the problem in the above example is that result is 'bla1;\nbla2;\nbla3;\n#endif\nbla4;' i.e. the \n before bla2 and bla3 did not disappear. I think the reason is that python has problems in doing a positive lookbehind with non fixed-length strings. – Mannaggia Apr 22 '14 at 10:09
  • @Mannaggia the result is the same in python check your link in the output section. see the Substitution result – aelor Apr 22 '14 at 10:26
  • The link might work, but I cannot reproduce this in python, e.g. seems not to work using re, perhaps I should try with regex package (http://stackoverflow.com/questions/11640447/regexps-variable-length-lookbehind-assertion-alternatives) – Mannaggia Apr 22 '14 at 10:33