1

How would I grab the first word after '\id ' in the string?

string:

'\id hello some random text that can be anything'

python

for line in lines_in:
    if line.startswith('\id '):
        book = line.replace('\id ', '').lower().rstrip()

what I am getting

book = 'hello some random text that can be anything'

what I want

book = 'hello'
jamylak
  • 128,818
  • 30
  • 231
  • 230
user1442957
  • 7,191
  • 5
  • 22
  • 19

6 Answers6

11

One option:

words = line.split()
try:
    word = words[words.index("\id") + 1]
except ValueError:
    pass    # no whitespace-delimited "\id" in the string
except IndexError:
    pass    # "\id" at the end of the string
Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
  • I'd suggest a default for word by making the except into something like `except (ValueError, IndexError): word = ''` – Ryan Haining Jul 13 '12 at 14:31
  • 3
    @xhainingx: I don't know what the OP wants to do with the different error conditions, so I just pointed them out – Sven Marnach Jul 13 '12 at 14:40
  • Yeah I wasn't correcting you, just suggesting a possible way to handle it, since this doesn't seem like the kind of question you'd see from someone well-versed in python – Ryan Haining Jul 13 '12 at 15:22
  • I like this more since it's *better to ask for forgiveness than permission* – jamylak Apr 06 '13 at 05:36
10
>>> import re
>>> text = '\id hello some random text that can be anything'
>>> match = re.search(r'\\id (\w+)', text)
>>> if match:
        print match.group(1)

A more complete version which captures any whitespace after '\id'

re.search(r'\\id\s*(\w+)', text)
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • @jamylak -- Apparently we were thinking on the same lines. I would suggest you change the regex to `r'\\id\s*(\w+)'` in order to capture multiple (or no) whitespace. – mgilson Jul 13 '12 at 14:36
  • @mgilson the OP said it works like this but that is your solution anyway. I would upvote it although I ran out of votes for today. – jamylak Jul 13 '12 at 14:38
  • @jamylak I was thinking about deleting the regex part of my solution in liu of yours -- you beat me to it anyway and since yours has more upvotes (and an accept :^p ) it'll be more visible for the community. – mgilson Jul 13 '12 at 14:42
  • @mgilson your regex is a more complete version of mine and you should get the accepted answer instead, although really SvenMarnach should get the accepted answer since it is non-regex. – jamylak Jul 13 '12 at 14:45
  • @mgilson I've never done this before but can I change this to community wiki and add your solution in? – jamylak Jul 13 '12 at 14:46
  • @jamylak -- You can do whatever you want with my solution (including adding it to your post without making it a community wiki). It's all in the public domain anyway. :) – mgilson Jul 13 '12 at 14:47
  • @mgilson Since your post was more complete I didn't feel like taking any reputation for it but since this was accepted like you said it will get the most attention so I thought it would be a good idea to make it community wiki. – jamylak Jul 13 '12 at 14:55
  • Isn't regex a bit too much for this? :P – Claudio Jul 13 '12 at 14:55
  • @Claudio The solution is pretty small but as I said before, unless it's needed it's better not to use it. Had to provide the alternative though, just to put it out there, and the post was tagged regex so the OP probably wanted to know how to do it with regex. – jamylak Jul 13 '12 at 14:59
1

You don't need regex for this you can do:

book.split(' ')[0]

But there are tons of ways to achieve this

iblazevic
  • 2,713
  • 2
  • 23
  • 38
1

If there doesn't have to be a space between "\id" and the word, regex will do fine. (if the space is guaranteed, then use the split solution):

import re
match=re.search(r'\\id\s*(\w+)',yourstring)
if match:
   print match.group(1)

Or another way (without regex):

head,sep,tail=yourstring.partition(r'\id')
first_word=tail.split()[1]
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • If there is only one `id`, you should use `str.partition` instead – jamylak Jul 13 '12 at 14:51
  • @jamylak -- changed. Is there a reason to promote partition instead of `split`? I suppose it helps with unpacking since you know exactly what you're going to get, but the same could be said for `.split('\id',1)`. Is partition faster? – mgilson Jul 13 '12 at 14:57
0

Try using str.split(' ') on your string book, which will split on spaces and give you a list of words. Then just do book = newList[0].

So book = book.split(' ')[0]

thegrinner
  • 11,546
  • 5
  • 41
  • 64
0

Since you already checked the line starts with "\id ", just split the string and you'll get a list of words. If you want the next one, just get element #1:

>>> line="\id hello some random text that can be anything"
>>> line.split()
['\\id', 'hello', 'some', 'random', 'text', 'that', 'can', 'be', 'anything']
    #0      #1  ...

That way your code should turn into this:

for line in lines_in:
    if line.startswith('\id '):
      book = line.split()[1]
Claudio
  • 2,191
  • 24
  • 49