Hello all… thanks to the post Using Python: to split long string, by given ‘separators’, I learned a way to split a long string.
However the ‘seperators’ are lost when the string is split:
import re
text = "C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/42006Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGas2007EngineCylinder:4VerticalInline2008Bore:1Stroke:1Cycle:42007Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/162008Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:92006Weight:4LBS1.75H.P.@65200RPM"
a = ['2006', '2007', '2008', '2009']
seperators = re.compile(r'|'.join(a))
e = seperators.split(text)
for f in e:
print f
the result looks like:
C-603WallWizard45256CCCylinders:2HorizontalOpposedBore:1-1/4 # '2006' is missing
Stroke:1-1/8Length: SingleVerticalBore:1-111Height:6Width:K-720Cooling:AirWeight:6LBS1.5H.P.@54500RPMC-60150ccGas # '2007' is missing
EngineCylinder:4VerticalInline # '2008' is missing
Bore:1Stroke:1Cycle:4 # '2007' is missing
Weight:6-1/2LBSLength:10Width: :AirLength16Cooling:AirLength:5Width:4L-233Height:6Weight: 4TheBlackKnightc-609SteamEngineBore:11/16Stroke:11/16 # '2008' is missing
Length:3Width:3Height:4TheChallengerC-600Bore:1Stroke:1P-305Weight:18LBSLength:12Width:7Height:8C-606Wall15ccGasEngineJ-142Cylinder:SingleVerticalBore:1Stroke:1-1/8Cooling:1Stroke:1-1/4HP:: /4Stroke:1-7/:6Width:6Height:9 # '2006' is missing
Weight:4LBS1.75H.P.@65200RPM
I want to have the ‘seperators’ kept when they are split. One way I tried is to add special characters in each ‘seperator’ then split the long string by the special character (in below, ‘@@@’ it is. And I know it’s not a smart way)
a = ['2006', '2007', '2008', '2009']
b = []
for eachone in a:
b.append(eachone + '@@@')
my_dic = dict(zip(a, b))
for e, f in my_dic.iteritems():
new_text = ''.join(text.replace(e, f))
however some characters are not replaced in the original string. Why?
On the other hand, is my way to split the long string with the ‘seperators’ kept is non-necessary? (I’ve checked other post but in my limited understanding, I can’t find the answer)
Thanks.