Somewhat hacky solution: string manipulation by trail and error
Observation: description-value pairs are delimited by ,
.
So let's try to split the string at these delimeters:
line = "Connection=Cable with connector, M12x1-Male, 4-pin, 0.30
m, Version=Background light, Dimension=43 x 9.5 x 64.5 mm, Rated operating volt
age Ue DC=24 V, Current draw max.=208 mA, Operating mode=Normal, Material=Alumin
um anodized, black Glass PMMA, Illumination area=25 x 25 mm, Light type=LED Red
light, Wave length=617 nm, Illuminence (0.1 m)=350 Lux, Beam angle=40 ° x 40 °,
Ambient temperature=-10...55 °C, Approval/Conformity=CE; EAC; WEEE, IP rating=IP
54"
line.split(', ')
Were there any commas (,
) not followed by a space? Let's check whether the split result still contains any commas:
>>> any(',' in part for part in line.split(', '))
False
Alright.
Observation: description and value are separated by =
.
Let's check whether all parts we identified contain a =
:
>>> all('=' in x for x in line.split(', '))
False
Huh. What happened? Let's look at the complete result:
>>> line.split(', ')
['Connection=Cable with connector',
'M12x1-Male',
'4-pin',
'0.30 m',
'Version=Background light',
'Dimension=43 x 9.5 x 64.5 mm',
'Rated operating voltage Ue DC=24 V',
'Current draw max.=208 mA',
'Operating mode=Normal',
'Material=Aluminum anodized',
'black Glass PMMA',
'Illumination area=25 x 25 mm',
'Light type=LED Red light',
'Wave length=617 nm',
'Illuminence (0.1 m)=350 Lux',
'Beam angle=40 ° x 40 °',
'Ambient temperature=-10...55 °C',
'Approval/Conformity=CE; EAC; WEEE',
'IP rating=IP54']
Aha: Seems like there are values that contain ,
:
Cable with connector, M12x1-Male, 4-pin, 0.30 m
Aluminum anodized, black Glass PMMA
and these were also split.
Let's simply rejoin those:
fake_parts = line.split(', ')
real_parts = []
for part in fake_parts:
if '=' in part:
real_parts.append(part)
else:
real_parts[-1] += f', {part}'
How does that look?
>>> real_parts
['Connection=Cable with connector, M12x1-Male, 4-pin, 0.30 m',
'Version=Background light',
'Dimension=43 x 9.5 x 64.5 mm',
'Rated operating voltage Ue DC=24 V',
'Current draw max.=208 mA',
'Operating mode=Normal',
'Material=Aluminum anodized, black Glass PMMA',
'Illumination area=25 x 25 mm',
'Light type=LED Red light',
'Wave length=617 nm',
'Illuminence (0.1 m)=350 Lux',
'Beam angle=40 ° x 40 °',
'Ambient temperature=-10...55 °C',
'Approval/Conformity=CE; EAC; WEEE',
'IP rating=IP54']
>>> all('=' in part for part in real_parts)
True
Much better!
Do all parts now contain exactly one =
? Let's try by splitting them up:
>>> all(len(part.split('=')) == 2 for part in real_parts)
True
Good. With that, we can form a dictionary:
>>> from collections import OrderedDict
>>> OrderedDict(part.split('=') for part in real_parts)
OrderedDict([('Connection', 'Cable with connector, M12x1-Male, 4-pin, 0.30 m'),
('Version', 'Background light'),
('Dimension', '43 x 9.5 x 64.5 mm'),
('Rated operating voltage Ue DC', '24 V'),
('Current draw max.', '208 mA'),
('Operating mode', 'Normal'),
('Material', 'Aluminum anodized, black Glass PMMA'),
('Illumination area', '25 x 25 mm'),
('Light type', 'LED Red light'),
('Wave length', '617 nm'),
('Illuminence (0.1 m)', '350 Lux'),
('Beam angle', '40 ° x 40 °'),
('Ambient temperature', '-10...55 °C'),
('Approval/Conformity', 'CE; EAC; WEEE'),
('IP rating', 'IP54')])
or simply
>>> dict(part.split('=') for part in real_parts)
{'Ambient temperature': '-10...55 °C',
'Approval/Conformity': 'CE; EAC; WEEE',
'Beam angle': '40 ° x 40 °',
'Connection': 'Cable with connector, M12x1-Male, 4-pin, 0.30 m',
'Current draw max.': '208 mA',
'Dimension': '43 x 9.5 x 64.5 mm',
'IP rating': 'IP54',
'Illumination area': '25 x 25 mm',
'Illuminence (0.1 m)': '350 Lux',
'Light type': 'LED Red light',
'Material': 'Aluminum anodized, black Glass PMMA',
'Operating mode': 'Normal',
'Rated operating voltage Ue DC': '24 V',
'Version': 'Background light',
'Wave length': '617 nm'}
Now that's something you can probably work with. However, this approach is fragile:
- What, if some descriptions also contain
,
?
- What, if descriptions or values contain
=
?
- What if the format contains specific escape sequences?
Proper solution: use a parser
To correctly interpret data encoded as text according to an elaborate (or not so elaborate) set of rules, use a parser library. See e.g. How best to parse a simple grammar? for options.
This though requires you to specify the exact set of rules (called "grammar") that govern the encoding, and thus to also know these rules. Whether and how well these rules can be derived by looking at the encoded data, depends on that data.