2

How can I remove all the empty rows from my printed text using BS and Python? I'm still new, I think what I'm talking about maybe called whitespace?

Current output:

02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks

 - Channel 60







02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield

 - Channel 04







03:00 - 05:00 MLS: Portland Timbers at Los Angeles Galaxy

 - Channel 05

Desired Output:

02:00 - 05:00 NHL: Columbus Blue Jackets at San Jose Sharks - Channel 60
02:30 - 04:30 NCAAB: Quinnipiac vs Fairfield - Channel 04 
03:00 - 05:00 MLS: Portland Timbers at Los Angeles Galaxy - Channel 05

Code:

import urllib, urllib2, re, HTMLParser, os
from bs4 import BeautifulSoup
import os

pg_source = ''
req = urllib2.Request('http://rushmore.tv/schedule')
req.add_header('User-Agent', 'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')

try:
    response = urllib2.urlopen(req)
    pg_source = response.read().decode('utf-8' , 'ignore')
    response.close()
except:
    pass

content = []
soup = BeautifulSoup(pg_source)
content = BeautifulSoup(soup.find('ul', { 'id' : 'myUL' }).prettify())

print (content.text)
Gahan
  • 4,075
  • 4
  • 24
  • 44
Daveabuk
  • 43
  • 6

2 Answers2

0

With a bit of list comprehension, .split(), .strip() and .join() you can build that output like:

Code:

text = [l.strip() for l in content.text.split('\n') if l.strip()]
print('\n'.join(' '.join(l) for l in zip(text[::2], text[1::2])))

Test Code:

import urllib, urllib2, re, HTMLParser, os
from bs4 import BeautifulSoup
import os

pg_source = ''
req = urllib2.Request('http://rushmore.tv/schedule')
req.add_header('User-Agent',
               'Mozilla/5.0 (Windows NT 6.3) AppleWebKit/537.36 '
               '(KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36')

try:
    response = urllib2.urlopen(req)
    pg_source = response.read().decode('utf-8', 'ignore')
    response.close()
except:
    pass

content = []
soup = BeautifulSoup(pg_source)
content = BeautifulSoup(soup.find('ul', {'id': 'myUL'}).prettify())

text = [l.strip() for l in content.text.split('\n') if l.strip()]
print('\n'.join(' '.join(l) for l in zip(text[::2], text[1::2])))

Results:

21:00 - 23:00 NCAAB:    Pepperdine vs Saint Mary's - Channel 03
21:30 - 00:00 AFL: Gold Coast vs. Geelong - Channel 47
22:00 - 00:00 A-League: Western Sydney Wanderers vs Perth Glory - BT Sport 1
22:45 - 03:00 Ski Classic: Mora - Channel 93
23:00 - 00:30 Freestyle Skiing WC: Ski Cross - Channel 106
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • That works for me, but is there an alternative to the print method? As when I'm printing this I do not use the print funtion. I am using xbmc.gui and to print using that I have to print the string. – Daveabuk Mar 06 '18 at 09:46
  • `print` is but a function that takes a string. You can do anything you want with that string. – Stephen Rauch Mar 06 '18 at 13:30
  • But the print on this includes "('\n'.join(' '.join(l) for l in zip(text[::2], text[1::2])))" How could I change that to something more simple like "print (string)" I will not be using the print function as I will be using this script on kodi and need to supply a simple string. Sorry for my ignorance, I'm still new to Python. Thanks. – Daveabuk Mar 09 '18 at 13:07
  • Everything between the `()` of the print function builds a string. You can simply assign that to a variable. like: `mystring = "\n".joi....` – Stephen Rauch Mar 09 '18 at 13:09
0

A very easy way to achieve the same results but less code is using requests module.

Here is the code.

import requests
from bs4 import BeautifulSoup

html = requests.get('http://rushmore.tv/schedule').text

soup = BeautifulSoup(html,'lxml')

ul = soup.find('ul', { 'id' : 'myUL' })

for content in ul.find_all('li'):
    print(content.text)

Just try this. It's working fine with me.

bigbounty
  • 16,526
  • 5
  • 37
  • 65
  • That's very kind of you, it would be perfect but Kodi does not use lxml (asthis script will be used in Kodi). I'm reading of ElementTree. Do you know if i would be able to use that instead of lxml? Thank you. – Daveabuk Mar 04 '18 at 19:20
  • Ok, so I have used html.parser instead. I am now facing a problem on Kodi that I need to now fix. – Daveabuk Mar 04 '18 at 19:39