0

I was wondering how to ignore or avoid the next same word as the first in some text using document?

I was searching for it but did not find any that close enough to what I'm imagining.

Let say I have these line in a text file

Group: 12 Cat: 100 Hen: 12 Cat: 200 Time: 328392 Cat: 123

I want the value for the first Cat only that is 100. I know I can use split and if-else statement to find what I want like

if "Cat: 100" in line:
    value = line.split("Cat: ")[1].split("Hen: ")[0]

which will give me the result 100

But what if I do not know the condition for it? For example I do not know the value of the first Cat whether it is 100 or 1242 or else?

Does anyone have any suggestion or solution for it? Thank you for your help.

Ling
  • 891
  • 5
  • 17
  • 40
  • 2
    It is very easy with a regex. In your case, `re.search('Cat: (\d+)', line).group(1)` (first, check if there is a match, then access `group(1)`). – Wiktor Stribiżew Sep 05 '17 at 09:07
  • what is the purpose of `group(1)` ? Is it for the first word of `Cat` ? – Ling Sep 05 '17 at 09:10
  • It is the value that is captured with the first set of unescaped parentheses in the pattern, i.e. `(\d+)` here. – Wiktor Stribiżew Sep 05 '17 at 09:16
  • 1
    When you search something with regex, and you enclose a pattern in parenthesis (like `(\d+)` in the example above), you can access different portions of the match object using `group(n)`. `group(0)` is always equal to the entire substring found (`Cat: 100`), `group(1)` is the first "grouped" pattern found (`100`). If you have multiple groups, then they follow the number given. See https://docs.python.org/3.5/library/re.html#match-objects for reference. – Russell Teapot Sep 05 '17 at 09:21

0 Answers0