How to get the first word in the string

Question

text is :

WYATT    - Ranked # 855 with    0.006   %
XAVIER   - Ranked # 587 with    0.013   %
YONG     - Ranked # 921 with    0.006   %
YOUNG    - Ranked # 807 with    0.007   %

I want to get only

WYATT
XAVIER
YONG
YOUNG

I tried :

(.*)?[ ]

But it gives me the :

WYATT    - Ranked

score 188 · Accepted Answer · answered Dec 06 '12 at 18:41

188

Regex is unnecessary for this. Just use some_string.split(' ', 1)[0] or some_string.partition(' ')[0].

answered Dec 06 '12 at 18:41

Silas Ray

25,682
5
48
63

1

Not if the words are separated by other characters (e.g. tabs). – orome Dec 31 '13 at 13:51
1

As long as they are separated by the same character, it will work fine. Just switch to `'\t'`. True it won't work if you have multiple delimiters, though even with 2 or 3 delimiters, it wouldn't be very difficult to use `split` or `partition` instead of regex. – Silas Ray Dec 31 '13 at 14:00
2

`some_string.split(None, 1)[0]` will work if more than one space separates the first word. – duanev Jun 16 '16 at 18:07
1

Imagine you really want the first word without assuming it's the first item in the split array. Imagine my_string = "1 2 3 4 <> coolest". I have regexs for many things but not one that given that string would return "coolest". I don't thing suggesting split makes sense since "return first word" says nothing about the implied order of where that work may be in the list of words. – Rich Sadowsky Feb 28 '18 at 06:00
3

You can get the leftover string as well with `firstword, leftoverstring = some_string.split(' ', 1)` – deanresin Apr 01 '19 at 03:26

score 33 · Answer 2 · edited Feb 09 '21 at 08:57

33

If you want to feel especially sly, you can write it as this:

(firstWord, rest) = yourLine.split(maxsplit=1)

This is supposed to bring the best from both worlds:

optimality tweak with maxsplit while splitting with any whitespace
improved reliability and readability, as argued by the author of the technique.

I kind of fell in love with this solution and it's general unpacking capability, so I had to share it.

edited Feb 09 '21 at 08:57

Neuron

5,141
5
38
59

answered Oct 18 '16 at 12:58

Huge

661
7
14

2

In python2 you may not be able to use the keyword argument, so you might want to go like `firstWord, rest = yourLine.split(None, 1)` loosing some readability though. – Huge Mar 14 '17 at 10:57
1

I like this, it's concise. And if you don't want the rest of the line, you can use `(firstWord, *_) = yourLine.split(maxsplit=1)`. Use `*_` instead of `_` because `split()` returns a variable number of arguments, according to the `maxsplit` parameter, and this will future-proof you. – Huw Walters May 25 '19 at 07:13
2

@HuwWalters I don't see why protecting yourself with `*` when `maxsplit=1` is used, there is limited number of results. – Huge Jun 12 '19 at 08:50
4

Because it saves you from coding errors. If you change the `maxsplit` value but fail to add an extra tuple element to unpack the extra value, as in `(firstWord, rest) = yourLine.split(maxsplit=2)`, you get `ValueError: too many values to unpack`. An added bonus is that you don't create an unused variable `rest`. – Huw Walters Jun 13 '19 at 12:39

score 15 · Answer 3 · answered Jan 12 '16 at 13:52

15

You shoud do something like :

print line.split()[0]

answered Jan 12 '16 at 13:52

Nado

287
4
7

6

I agree. But small optimization tip: `print line.split(' ', 1)[0]`. This limits the split to the first word. – Ricardo Magalhães Cruz Aug 18 '16 at 12:59
what does the "1" do here? – algorythms Jan 31 '19 at 19:48
1

@algorythms Short circuits after finding the first split character, so you don't traverse the tail of the string. – Silas Ray Mar 11 '19 at 19:13
@algorythms Here `1` is the `maxsplit` argument. There are some examples in the documentation to show how it works https://python-reference.readthedocs.io/en/latest/docs/str/split.html – nish-ant Jul 06 '23 at 08:05

Anirudha · Answer 4 · 2012-12-06T18:46:18.987

11

Use this regex

^\w+

\w+ matches 1 to many characters.

\w is similar to [a-zA-Z0-9_]

^ depicts the start of a string

About Your Regex

Your regex (.*)?[ ] should be ^(.*?)[ ] or ^(.*?)(?=[ ]) if you don't want the space

edited Dec 06 '12 at 18:46

answered Dec 06 '12 at 18:39

Anirudha

32,393
7
68
89

1

You don't explain what functions to call and how when using this regex. – Doron Behar Sep 21 '22 at 14:21

score 7 · Answer 5 · answered Dec 06 '12 at 18:47

7

Don't need a regex. string[: string.find(' ')]

answered Dec 06 '12 at 18:47

Ricardo Alvaro Lohmann

26,031
7
82
82

1

This is a bit more esoteric than `split` or `partition`, I think. Do you get meaningful performance or memory gains this way? You'd have to essentially iterate to the first instance of the token twice using this, but on the flip side, you wouldn't end up with the new tail string that you just throw away... – Silas Ray Dec 06 '12 at 18:54
@sr2222 Yes, it has to iterate twice, but not thrue all the string. – Ricardo Alvaro Lohmann Dec 06 '12 at 18:56
Neither does `split` with a token limit or `partition`. – Silas Ray Dec 06 '12 at 19:00
1

This approach is a nice optimization but it does not work well if the OP wants it to work when the first word is the entire string. If no spaces are found, `string.find` returns `-1`, removing the last character. – Ricardo Magalhães Cruz Aug 18 '16 at 12:59

score 2 · Answer 6 · answered Dec 06 '12 at 18:42

You don't need regex to split a string on whitespace:

In [1]: text = '''WYATT    - Ranked # 855 with    0.006   %
   ...: XAVIER   - Ranked # 587 with    0.013   %
   ...: YONG     - Ranked # 921 with    0.006   %
   ...: YOUNG    - Ranked # 807 with    0.007   %'''

In [2]: print '\n'.join(line.split()[0] for line in text.split('\n'))
WYATT
XAVIER
YONG
YOUNG

How to get the first word in the string

6 Answers6

Linked

Related