I am looking for a way to get all of the letters in a string before a : but I have no idea on where to start. Would I use regex? If so how?
string = "Username: How are you today?"
Can someone show me a example on what I could do?
I am looking for a way to get all of the letters in a string before a : but I have no idea on where to start. Would I use regex? If so how?
string = "Username: How are you today?"
Can someone show me a example on what I could do?
Just use the split
function. It returns a list, so you can keep the first element:
>>> s1.split(':')
['Username', ' How are you today?']
>>> s1.split(':')[0]
'Username'
Using index
:
>>> string = "Username: How are you today?"
>>> string[:string.index(":")]
'Username'
The index will give you the position of :
in string, then you can slice it.
If you want to use regex:
>>> import re
>>> re.match("(.*?):",string).group()
'Username'
match
matches from the start of the string.
you can also use itertools.takewhile
>>> import itertools
>>> "".join(itertools.takewhile(lambda x: x!=":", string))
'Username'
You don't need regex
for this
>>> s = "Username: How are you today?"
You can use the split
method to split the string on the ':'
character
>>> s.split(':')
['Username', ' How are you today?']
And slice out element [0]
to get the first part of the string
>>> s.split(':')[0]
'Username'
I have benchmarked these various technics under Python 3.7.0 (IPython).
c
is known): pre-compiled regex.s.partition(c)[0]
.c
may not be in s
): partition, split.import string, random, re
SYMBOLS = string.ascii_uppercase + string.digits
SIZE = 100
def create_test_set(string_length):
for _ in range(SIZE):
random_string = ''.join(random.choices(SYMBOLS, k=string_length))
yield (random.choice(random_string), random_string)
for string_length in (2**4, 2**8, 2**16, 2**32):
print("\nString length:", string_length)
print(" regex (compiled):", end=" ")
test_set_for_regex = ((re.compile("(.*?)" + c).match, s) for (c, s) in test_set)
%timeit [re_match(s).group() for (re_match, s) in test_set_for_regex]
test_set = list(create_test_set(16))
print(" partition: ", end=" ")
%timeit [s.partition(c)[0] for (c, s) in test_set]
print(" index: ", end=" ")
%timeit [s[:s.index(c)] for (c, s) in test_set]
print(" split (limited): ", end=" ")
%timeit [s.split(c, 1)[0] for (c, s) in test_set]
print(" split: ", end=" ")
%timeit [s.split(c)[0] for (c, s) in test_set]
print(" regex: ", end=" ")
%timeit [re.match("(.*?)" + c, s).group() for (c, s) in test_set]
String length: 16
regex (compiled): 156 ns ± 4.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 19.3 µs ± 430 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index: 26.1 µs ± 341 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 26.8 µs ± 1.26 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 26.3 µs ± 835 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 128 µs ± 4.02 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
String length: 256
regex (compiled): 167 ns ± 2.7 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 20.9 µs ± 694 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
index: 28.6 µs ± 2.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 27.4 µs ± 979 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 31.5 µs ± 4.86 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 148 µs ± 7.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
String length: 65536
regex (compiled): 173 ns ± 3.95 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 20.9 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index: 27.7 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 27.2 µs ± 796 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 26.5 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 128 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
String length: 4294967296
regex (compiled): 165 ns ± 1.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
partition: 19.9 µs ± 144 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
index: 27.7 µs ± 571 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split (limited): 26.1 µs ± 472 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
split: 28.1 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
regex: 137 µs ± 6.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
partition() may be better then split() for this purpose as it has the better predicable results for situations you have no delimiter or more delimiters.
To solve this using RegEx, you can use the Negative Lookahead/Negative Lookbehind approach.
For example, the code below for Python:
import re
string = "Username: How are you today?"
regex='(\S*)[:]'
data=re.findall(regex, string)
print(data)