0

I want to extract all HTML elements from Telegram web website. I tried all methods e.g. get, post, get() jquery, methods from Python, JavaScript, ...

But when they return the result, it is incomplete and some part of it is missing. How can I do this correctly?

This a snippet that returns an incomplete alements:

fetch("https://web.telegram.org/k/")
  .then(x => x.text())
  .then(y => console.log(y));
mkrieger1
  • 19,194
  • 5
  • 54
  • 65

3 Answers3

0

try this way,

// first install jsdom
// type npm i jsdom in the console.

const jsdom = require("jsdom");
const { JSDOM } = jsdom;

fetch("https://web.telegram.org/k/")
    .then(x => x.text())
    .then(y => {
        const { document } = (new JSDOM(y)).window;
        console.log(document)
});

checkout jsdom documantation: https://github.com/jsdom/jsdom

Vinod Liyanage
  • 945
  • 4
  • 13
  • Thanks sir, but this code in VCS doesn't work. It jumps from line 2 to end. I don't see anything in terminal. – KnowledgeLover Aug 28 '22 at 16:32
  • actually, you can't send get request to access content operating within the current origin. – Vinod Liyanage Aug 28 '22 at 16:38
  • check this, https://stackoverflow.com/questions/43871637/no-access-control-allow-origin-header-is-present-on-the-requested-resource-whe – Vinod Liyanage Aug 28 '22 at 16:38
  • const parser = new DOMParser(); ^ ReferenceError: DOMParser is not defined at C:\Users\Ali.Molaei\Desktop\javascript\firstscraper\scraperapi.js:26:24 at process.processTicksAndRejections (node:internal/process/task_queues:95:5) – KnowledgeLover Aug 28 '22 at 16:57
  • are you using node js? node does not have DOMParser. you need to use a different dom parser library for this. is this slove your issue? https://stackoverflow.com/questions/11398419/trying-to-use-the-domparser-with-node-js – Vinod Liyanage Aug 28 '22 at 17:00
  • When I installed the Allow-CORS extension for chrome, my code worked in chrome console and VCS, but again incomplete. Your code also works in console and not in VCS. But main problem still persists. – KnowledgeLover Aug 28 '22 at 17:06
  • I've updated the answer. check if it helps your problem. – Vinod Liyanage Aug 28 '22 at 17:13
  • 1
    (node:3784) ExperimentalWarning: The Fetch API is an experimental feature. This feature could change at any time (Use `node --trace-warnings ...` to show where the warning was created) Document { location: [Getter/Setter] } – KnowledgeLover Aug 28 '22 at 17:39
  • This error is return by vsc. – KnowledgeLover Aug 28 '22 at 17:39
0

did you try to add header: "Application-Type" :"text/html"

n1koloza
  • 539
  • 5
  • 11
0

I learned for Telegram Web scraping, we cannot use the traditional javascript codes or simple Python library. In this case, we MUST use Selenium and WebDriver and I'm working on it. Any better suggestion will be appreciated.