0

I know this question has been asked for several time but it seems that it did not match exactly my case as I have the following codes with my string :

import re
my_string = "INTRODUCTION Wood density and the angle of its microfibrils in the secondary wall are of particular interest for breeding programs  (Raymond 2002)  since they are the two main factors affecting wood mechanical properties. Microfibril angle (MFA) is a property of the cell wall of wood fibres, which is made up of millions of strands of cellulose called microfibrils  (Fang et al. 2006 ). This elementary wood trait represents the orientation of crystalline cellulose in the cell wall along the fiber axis  (Andersson et al. 2000) ."
list_phrase = re.split(r'(?<=\.) ([a-zA-Z]+)', my_string)
print(list_phrase)

Here, I try to seperate the sentence with the pattern ". [a-zA-Z]+". Normally, The "." attached with the first sentence which is fine for me but the problem is the pattern [a-zA-Z]+ did not attached with the second sentence. I got the following output:

['INTRODUCTION Wood density and the angle of its microfibrils in the secondary wall are of particular interest for breeding programs  (Raymond 2002)  since they are the two main factors affecting wood mechanical properties.', 'Microfibril', ' angle (MFA) is a property of the cell wall of wood fibres, which is made up of millions of strands of cellulose called microfibrils  (Fang et al. 2006 ).', 'This', ' elementary wood trait represents the orientation of crystalline cellulose in the cell wall along the fiber axis  (Andersson et al. 2000) .']

My desired output:

['INTRODUCTION Wood density and the angle of its microfibrils in the secondary wall are of particular interest for breeding programs  (Raymond 2002)  since they are the two main factors affecting wood mechanical properties.', 'Microfibril angle (MFA) is a property of the cell wall of wood fibres, which is made up of millions of strands of cellulose called microfibrils  (Fang et al. 2006 ).', 'This elementary wood trait represents the orientation of crystalline cellulose in the cell wall along the fiber axis  (Andersson et al. 2000) .']

How can I adjust my regex in order to achieve the desired result?

Any help for this would be much appreciated. Thank you.

Erwin
  • 325
  • 1
  • 9

0 Answers0