How to loop using .split() function on a text file python

Question

I have a html file with different team names written throughout the file. I just want to grab the team names. The team names always occur after certain text and end before certain text, so I've split function to find the team name. I'm a beginner, and I'm sure I'm making this harder than it is. Data is the file

teams = data.split('team-away">')[1].split("</sp")[0]
    for team in teams:
        print team

This returns each individual character for the first team that it finds (so for example, if teams = San Francisco 49ers, it prints "S", then "A", etc. instead of what I need it to do: Print "San Francisco 49ers" then on the next line the next team "Carolina Panthers", etc.

Thank you!

[Déjà vu](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) — shx2, Nov 18 '13 at 06:02
Yeah, I wondered how long before someone links to "Tony the Pony". The OP does state "I have a html file...The team names always occur after certain text and end before certain text" Assuming @sdeep27 is correctly describing the problem (and who would know but himself), then plain text search works 100% (but of course, not best practice). — Paul Draper, Nov 18 '13 at 07:15

Paul Draper · Answer 1 · 2013-11-18T07:11:31.763

2

"I'm a beginner, and I'm sure I'm making this harder than it is."

Well, kind of.

import re
teams = re.findall('team-away">(.*)</sp', data)

(with credit to Kurtis, for a simpler regular expression than I originally had)

Though an actual HTML parser would be best practice.

edited Nov 18 '13 at 07:11

answered Nov 18 '13 at 05:56

Paul Draper

78,542
46
206
285

Out of curiosity, why is this regex superior to re.findall('team-away">(.*?) – Curt Nov 18 '13 at 07:05
1

@Kurtis, you are right. I had assumed that `findall` would match the *entire* regex, so I used lookbehind and lookahead. But if exactly one capture group exists, the returned match is only that group. Improving answer now. – Paul Draper Nov 18 '13 at 07:10
so I went and looked over some re to understand your answer, and this does tackle it, thank you. Quick question though, as I only need the letters returned and not numbers, I tried (\w*) instead of (.*) and it returned an empty list. Do you know why this could be, or is \w the wrong expression to use? – sdeep27 Nov 18 '13 at 08:07
I'm not sure exactly what you mean. Perhaps `([A-Za-z]*).*`? – Paul Draper Nov 18 '13 at 16:29
isn't \w the regex identifier for all letters? How come we use the format [A-Za-z] instead? – sdeep27 Nov 20 '13 at 18:59
@sdeep27, `\w` is all "word" characters, which consists of all ASCII letters and underscore. – Paul Draper Nov 21 '13 at 08:17

score 0 · Answer 2 · answered Nov 18 '13 at 05:55

0

Don't re-invent the wheel! Look into BeautifulSoup, it'll to the job for you.

answered Nov 18 '13 at 05:55

Steinar Lima

7,644
2
39
40

How to loop using .split() function on a text file python

2 Answers2