88

text is :

WYATT    - Ranked # 855 with    0.006   %
XAVIER   - Ranked # 587 with    0.013   %
YONG     - Ranked # 921 with    0.006   %
YOUNG    - Ranked # 807 with    0.007   %

I want to get only

WYATT
XAVIER
YONG
YOUNG

I tried :

(.*)?[ ]

But it gives me the :

WYATT    - Ranked
Nick
  • 4,302
  • 2
  • 24
  • 38
Vor
  • 33,215
  • 43
  • 135
  • 193

6 Answers6

188

Regex is unnecessary for this. Just use some_string.split(' ', 1)[0] or some_string.partition(' ')[0].

Silas Ray
  • 25,682
  • 5
  • 48
  • 63
  • 1
    Not if the words are separated by other characters (e.g. tabs). – orome Dec 31 '13 at 13:51
  • 1
    As long as they are separated by the same character, it will work fine. Just switch to `'\t'`. True it won't work if you have multiple delimiters, though even with 2 or 3 delimiters, it wouldn't be very difficult to use `split` or `partition` instead of regex. – Silas Ray Dec 31 '13 at 14:00
  • 2
    `some_string.split(None, 1)[0]` will work if more than one space separates the first word. – duanev Jun 16 '16 at 18:07
  • 1
    Imagine you really want the first word without assuming it's the first item in the split array. Imagine my_string = "1 2 3 4 <> coolest". I have regexs for many things but not one that given that string would return "coolest". I don't thing suggesting split makes sense since "return first word" says nothing about the implied order of where that work may be in the list of words. – Rich Sadowsky Feb 28 '18 at 06:00
  • 3
    You can get the leftover string as well with `firstword, leftoverstring = some_string.split(' ', 1)` – deanresin Apr 01 '19 at 03:26
33

If you want to feel especially sly, you can write it as this:

(firstWord, rest) = yourLine.split(maxsplit=1)

This is supposed to bring the best from both worlds:

I kind of fell in love with this solution and it's general unpacking capability, so I had to share it.

Neuron
  • 5,141
  • 5
  • 38
  • 59
Huge
  • 661
  • 7
  • 14
  • 2
    In python2 you may not be able to use the keyword argument, so you might want to go like `firstWord, rest = yourLine.split(None, 1)` loosing some readability though. – Huge Mar 14 '17 at 10:57
  • 1
    I like this, it's concise. And if you don't want the rest of the line, you can use `(firstWord, *_) = yourLine.split(maxsplit=1)`. Use `*_` instead of `_` because `split()` returns a variable number of arguments, according to the `maxsplit` parameter, and this will future-proof you. – Huw Walters May 25 '19 at 07:13
  • 2
    @HuwWalters I don't see why protecting yourself with `*` when `maxsplit=1` is used, there is limited number of results. – Huge Jun 12 '19 at 08:50
  • 4
    Because it saves you from coding errors. If you change the `maxsplit` value but fail to add an extra tuple element to unpack the extra value, as in `(firstWord, rest) = yourLine.split(maxsplit=2)`, you get `ValueError: too many values to unpack`. An added bonus is that you don't create an unused variable `rest`. – Huw Walters Jun 13 '19 at 12:39
15

You shoud do something like :

print line.split()[0]
Nado
  • 287
  • 4
  • 7
11

Use this regex

^\w+

\w+ matches 1 to many characters.

\w is similar to [a-zA-Z0-9_]

^ depicts the start of a string


About Your Regex

Your regex (.*)?[ ] should be ^(.*?)[ ] or ^(.*?)(?=[ ]) if you don't want the space

Anirudha
  • 32,393
  • 7
  • 68
  • 89
7

Don't need a regex. string[: string.find(' ')]

Ricardo Alvaro Lohmann
  • 26,031
  • 7
  • 82
  • 82
  • 1
    This is a bit more esoteric than `split` or `partition`, I think. Do you get meaningful performance or memory gains this way? You'd have to essentially iterate to the first instance of the token twice using this, but on the flip side, you wouldn't end up with the new tail string that you just throw away... – Silas Ray Dec 06 '12 at 18:54
  • @sr2222 Yes, it has to iterate twice, but not thrue all the string. – Ricardo Alvaro Lohmann Dec 06 '12 at 18:56
  • Neither does `split` with a token limit or `partition`. – Silas Ray Dec 06 '12 at 19:00
  • 1
    This approach is a nice optimization but it does not work well if the OP wants it to work when the first word is the entire string. If no spaces are found, `string.find` returns `-1`, removing the last character. – Ricardo Magalhães Cruz Aug 18 '16 at 12:59
2

You don't need regex to split a string on whitespace:

In [1]: text = '''WYATT    - Ranked # 855 with    0.006   %
   ...: XAVIER   - Ranked # 587 with    0.013   %
   ...: YONG     - Ranked # 921 with    0.006   %
   ...: YOUNG    - Ranked # 807 with    0.007   %'''

In [2]: print '\n'.join(line.split()[0] for line in text.split('\n'))
WYATT
XAVIER
YONG
YOUNG
Lev Levitsky
  • 63,701
  • 20
  • 147
  • 175