I have some strings which have this structure: <name> (<unit>)
. I would like to extract name
and unit
; to perform this task I use regex
and in most cases it is all fine.
However, in some cases the <unit>
is formed by greek characters, like Ω
. In these cases, my code fails to extract the two desired parts.
Here is my code:
import re
def name_unit_split(text):
name = re.split(' \([A-Za-z]*\)', text)[0]
unit = re.findall('\([A-Za-z]*\)', text)
if unit != []:
unit = unit[0][1:-1]
else:
unit = ''
return name, unit
print(name_unit_split('distance (mm)'))
and I get:
('distance', 'mm')
But when I try with:
print(name_unit_split('resistance (Ω)'))
I get:
('resistance (Ω)', '')
I searched for other regex placeholders and try to use these, without success:
name = re.split(' \([\p{Greek}]*\)', text)[0]
unit = re.findall('\([\p{Greek}]*\)', text)
How can I find greek characters (one or more, grouped) in a string using regex
?
Furthermore, is there a better way to perform the above described task using regex
? I mean: there is a way to extract both <name>
and <unit>
and save them in name
and unit
with regex
?