0

I tried extracted table using python but cannot remove \n despite using replace, remove, rsplit, lsplit functions. Please help.

Following is my code.

from urllib.request import urlopen
from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"

res = requests.get(url)

soup = BeautifulSoup(res.text, 'lxml')

soup.prettify()

Header = soup.findAll('tr', limit=2)[1].findAll('th')

column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]

print(column_headers)

data_rows = soup.findAll('tr')[2:]

i = range(len(data_rows))

for td in data_rows:
    row = td.get_text()
    print(row)

My code output is below. Copied only few lines.

['Contract', 'Last', 'Change', 'Open', 'High', 'Low', 'Volume', 'Prev. Stl.', 'Time', 'Links']
\n    Cash (NGY00)\n    2.890s\n    +0.020\n    0.000\n    2.890\n    2.890\n    0\n    2.870\n    05/25/18\n    Q / C / O\n  
\n    Jun \'18 (NGM18)\n    2.946\n    +0.007\n    2.946\n    2.968\n    2.908\n    2331\n    2.939\n    19:13\n    Q / C / O\n  
\n    Jul \'18 (NGN18)\n    2.974\n    +0.011\n    2.974\n    3.000\n    2.937\n    23859\n    2.963\n    19:37\n    Q / C / O\n  
\n    Aug \'18 (NGQ18)\n    2.989\n    +0.006\n    2.983\n    3.016\n    2.957\n    4434\n    2.983\n    18:25\n    Q / C / O\n  
\n    Sep \'18 (NGU18)\n    2.977\n    +0.010\n    2.970\n    2.998\n    2.942\n    2313\n    2.967\n    18:07\n    Q / C / O\n  
\n    Oct \'18 (NGV18)\n    2.975\n    +0.005\n    2.969\n    2.999\n    2.944\n    2259\n    2.970\n    19:01\n    Q / C / O\n  
\n    Nov \'18 (NGX18)\n    3.013\n    +0.005\n    3.007\n    3.034\n    2.983\n    1774\n    3.008\n    19:18\n    Q / C / O\n  
\n    Dec \'18 (NGZ18)\n    3.113\n    +0.007\n    3.106\n    3.131\n    3.082\n    1287\n    3.106\n    17:59\n    Q / C / O\n  
\n    Jan \'19 (NGF19)\n    3.198\n    +0.011\n    3.177\n    3.212\n    3.165\n    1737\n    3.187\n    17:51\n    Q / C / O\n  
\n    Feb \'19 (NGG19)\n    3.156\n    +0.008\n    3.137\n    3.170\n    3.126\n    776\n    3.148\n    17:39\n    Q / C / O\n  
\n    Mar \'19 (NGH19)\n    3.042\n    +0.002\n    3.042\n    3.063\n    3.017\n    2891\n    3.040\n    18:27\n    Q / C / O\n  
\n    Apr \'19 (NGJ19)\n    2.672\n    +0.018\n    2.662\n    2.676\n    2.648\n    2403\n    2.654\n    11:00\n    Q / C / O\n 
Austin
  • 25,759
  • 4
  • 25
  • 48
  • So I assume your variable called `row` is not really a row but a field. Did you try `print(row.strip())`? Note that strip() does not alter the string but returns a new string. – ypnos May 29 '18 at 11:31

2 Answers2

0

I saved your output to a res variable and called res.replace("\n","") and it worked. Try calling this on each of your rows.

AndrejH
  • 2,028
  • 1
  • 11
  • 23
0

Perhaps this is closer to what you're trying to accomplish:

from bs4 import BeautifulSoup
import requests

url = "https://shared.websol.barchart.com/quotes/quote.php?page=quote&sym=ng&x=13&y=8&domain=if&display_ice=1&enabled_ice_exchanges=&tz=0&ed=0"
res = requests.get(url)
soup = BeautifulSoup(res.text, 'lxml')

column_headers = [th.getText() for th in soup.findAll('tr', limit=2)[1].findAll('th')]
print(column_headers)

data_rows = soup.findAll('tr')[2:]
for td in data_rows:
    row = td.get_text().replace('\\n', '').strip()
    print(row)
Jonas Byström
  • 25,316
  • 23
  • 100
  • 147