-2

I am trying to scrape the texts from a website with BeutifulSoup + python-requests. But it is only getting [] as an output.

from bs4 import BeautifulSoup
import requests
import urllib.request

page = requests.get("https://www.adsbhub.org/stations.php")

soup = BeautifulSoup(page.content, "lxml")

table = soup.find_all('table', id="jqGridUsers")

print(table)

The above gives me a table that I need to scrape values.

Output:

[<table id="jqGridUsers"></table>]

But when I try to extract the data and find tr in the table it returns an empty list.

What am I doing wrong?

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • You are dealing with `JS` website, BS4 will not help you to render the website, what's your goal here ? maybe checking for API or use a real browser if not such as selenium – αԋɱҽԃ αмєяιcαη May 24 '22 at 07:03
  • I want to scrape all the data from the table which is on the target URL. Like ID, station, User Nickname etc @αԋɱҽԃαмєяιcαη – Lalit Joshi May 24 '22 at 07:05
  • call [API](https://www.adsbhub.org/stations_ctr.php?cmd=1&webkey=e2321319bb42e360a23413q29772a2b2a2&_search=false&nd=1653375896726&rows=3000&page=1&sidx=&sord=asc) directly – αԋɱҽԃ αмєяιcαη May 24 '22 at 07:06
  • Thanks for the answer, can you suggest some documentation for the same as I am new to this. – Lalit Joshi May 24 '22 at 07:10
  • Possible duplicate of https://stackoverflow.com/questions/8049520/web-scraping-javascript-page-with-python – tripleee May 24 '22 at 07:11

1 Answers1

0

Take into account nd parameter is a Unix Time 1653375896726, you can not specify it during the request to get up to date data.

import pandas as pd
import requests
import json


def main(url):
    params = {
        "cmd": "1",
        "webkey": "e2321319bb42e360a23413q29772a2b2a2",
        "_search": "false",
        "nd": "1653375896726",
        "rows": "3000",
        "page": "1",
        "sidx": "",
        "sord": "asc"
    }
    r = requests.get(url, params=params)
    data = json.loads(r.text[9:])
    target = [i['cell'] for i in data['rows']]
    df = pd.DataFrame(target)
    print(df)


main('https://www.adsbhub.org/stations_ctr.php')
         0                    1               2  ...       7     8     9
0      460           PD3RFR|460  Radar Maarssen  ...  381434     1   460
1     2018            EDDN|2018     DerrChecker  ...  506720     1  2018
2      291       Flightlive|291      flightlive  ...  161816     1   291
3     3114      FachaRadar|3114           facha  ...   31995     1  3114
4     3056  Fly Italy Adsb|3056                  ...  328618     1  3056
...    ...                  ...             ...  ...     ...   ...   ...
2216  1829           India|1829          Sanket  ...    None  None  1829
2217  2771        N-Eugene|2771                  ...    None  None  2771
2218   791       RPI Geneva|791          Marclg  ...    None  None   791
2219  1617        T-EDDV66|1617                  ...    None  None  1617
2220  2533  Nijmegen-Radar|2533  Nijmegen-Radar  ...    None  None  2533

[2221 rows x 10 columns]