Say I have a string:
foo: bar:baz : moo:mar:maz
I want to count the number of times a colon appears in this string, with a non-whitespace character immediately to the left or right of it. So foo:
counts for one instance, bar:baz
count for two more, and moo:mar:maz
count for four instances total. We count mar
twice because it's on both the right and left of a colon. The lone colon :
doesn't count for anything, because it's got no adjacent non-whitespace character.
The count for the above string should therefore be 7.
I can do this by regex, as in:
str = "foo: bar:baz : moo:mar:maz"
left = len(re.findall("\S:", str))
right = len(re.findall(":\S", str))
offset = left + right
But I want to do this without regex, as I'm running a script that needs to be as optimised as possible. Is there any way to do this using only string functions?
Here's one method I tried, which basically splits up the string by spaces, then examines each substring and splits that up by colons, counting the number of elements in the resulting list and adding it to the total.
spl = str.split(" ")
count=0
print(spl)
for element in spl:
subspl = element.split(':')
print(subspl)
if len(subspl) > 1:
count += len([s for s in subspl if s != ''])
This almost works, but it fails on moo:mar:maz
- the [s for s in subspl if s != '']
list comprehension returns ['moo', 'mar', 'maz']
, which has three elements. This should add four to the total, not three.
Is there a way to do this using only string methods, or which is faster than regexes?
EDIT: An edge case I hadn't considered was pointed out. If the string is foo::bar
foo::::bar
or foo: bar:
I want the code to count 2
in all cases. A colon adjacent to another colon shouldn't count towards the total, so :::
and ::
and :::::::
should all count for 0. I only want to record the number of times where a non-colon, non-whitespace character is immediately adjacent to a colon.