0

I have to write a single function that should return the first word in the following strings:

("Hello world") -> return "Hello"
(" a word ") -> return "a"
("don't touch it") -> return "don't"
("greetings, friends") -> return "greetings"
("... and so on ...") -> return "and"
("hi") -> return "hi"

All have to return the first word and as you can see some start with a whitespace, have apostrophes or end with commas.

I've used the following options:

return text.split()[0]
return re.split(r'\w*, text)[0]

Both error at some of the strings, so who can help me???

T.Python
  • 19
  • 1
  • 3
  • `re.search(r'\w+', text).group()`? – cs95 Jan 04 '18 at 10:58
  • 2
    @cᴏʟᴅsᴘᴇᴇᴅ will return `don` instead of `don't` ;) Try `re.search('[\w\']+', s).group()` – DeepSpace Jan 04 '18 at 10:58
  • Try [`r""""[^"\w]*([\w'-]+)"""`](https://regex101.com/r/up15ZL/1) – Wiktor Stribiżew Jan 04 '18 at 11:00
  • 6
    @DeepSpace The annoying thing about this question is the arbitrary restrictions with what is to be considered part of a word and what isn't. – cs95 Jan 04 '18 at 11:00
  • One could use `[\w']+` to find all word-constituents and apostrophes but that would quickly lead to a problem with an input like `"'No!' he shouted"` (`'No` is probably not wanted). – Alfe Jan 04 '18 at 11:02
  • Possible duplicate of https://stackoverflow.com/questions/13750265/how-to-get-the-first-word-in-the-string, answered here. –  Jan 04 '18 at 11:05
  • @DNinja21 Nope, that's not it. – cs95 Jan 04 '18 at 11:06
  • What about `1 plus 2 gives 3`? Do you want `1` or `plus`? – Toto Jan 04 '18 at 13:07

6 Answers6

2

Try the below code. I tested with all your inputs and it works fine.

import re
text=["Hello world"," a word ","don't touch it","greetings, friends","... and so on ...","hi"]
for i in text:
    rgx = re.compile("(\w[\w']*\w|\w)")
    out=rgx.findall(i)
    print out[0]

Output:

Hello
a
don't
greetings
and
hi
Abhijit
  • 1,728
  • 1
  • 14
  • 25
1

It is tricky to distinguish apostrophes which are supposed to be part of a word and single quotes which are punctuation for the syntax. But since your input examples do not show single quotes, I can go with this:

re.match(r'\W*(\w[^,. !?"]*)', text).groups()[0]

For all your examples, this works. It won't work for atypical stuff like "'tis all in vain!", though. It assumes that words end on commas, dots, spaces, bangs, question marks, and double quotes. This list can be extended on demand (in the brackets).

Alfe
  • 56,346
  • 20
  • 107
  • 159
1

A non-regex solution: stripping off leading punctation/whitespace characters, splitting the string to get the first word, then removing trailing punctuation/whitespace:

from string import punctuation, whitespace

def first_word(s):
    to_strip = punctuation + whitespace
    return s.lstrip(to_strip).split(' ', 1)[0].rstrip(to_strip)

tests = [
"Hello world",
"a word",
"don't touch it",
"greetings, friends",
"... and so on ...",
"hi"]

for test in tests:
    print('#{}#'.format(first_word(test)))

Outputs:

#Hello#
#a#
#don't#
#greetings#
#and#
#hi#
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
1

try this one:

>>> def pm(s):
...     p = r"[a-zA-Z][\w']*"
...     m = re.search(p,s)
...     print m.group(0)
... 

test result:

>>> pm("don't touch it")
don't
>>> pm("Hello w")
Hello
>>> pm("greatings, friends")
greatings
>>> pm("... and so on...")
and
>>> pm("hi")
hi
Shen Yudong
  • 1,190
  • 7
  • 14
0

You can try something like this:

import re
pattern=r"[a-zA-Z']+"
def first_word(words_tuple):
    match=re.findall(pattern,words_tuple)
    for i in match:
        if i[0].isalnum():
            return i



print(first_word(("don't touch it")))

output:

don't
0

I've done this by using the first occurrence of white space to stop the "getting" of the first word. Something like this:

stringVariable = whatever sentence
firstWord = ""
stringVariableLength = len(stringVariable)
for i in range(0, stringVariableLength):
    if stringVariable[i] != " ":
        firstWord = firstWord + stringVariable[i]
    else:
        break

This code will parse through the string variable that you want to get the first word of, and add it into a new variable called firstWord, until it gets to the first occurance of white space. I'm not exactly sure how you would put that into a function as I'm pretty new to this whole thing, but I'm sure it could be done!

David Buck
  • 3,752
  • 35
  • 31
  • 35