2

I want scrap chat messages in youtube live chat. At first, I just followed a way in "https://www.youtube.com/watch?v=W2DS6wT6_48"

But the code does not work.

The error message is

all_comments = driver.find_element_by_id("all-comments")

...

selenium.common.exceptions.NoSuchElementException: Message: {"errorMessage":"Unable to find element with id 'all-comments'","request":{"headers":{"Accept":"application/json","Accept-Encoding":"identity","Connection":"close","Content-Length":"93","Content-Type":"application/json;charset=UTF-8","Host":"127.0.0.1:12695", "User-Agent":"Python-urllib/2.7"},"httpVersion":"1.1","method":"POST","post":"{\"using\": \"id\", \"sessionId\": \"e4b63b00-fe9c-11e6-a630-0fa086b5cd8d\", \"value\": \"all-comments\"}","url":"/element","urlParsed":{"anchor":"","query":"","file":"element","directory":"/","path":"/element","relative":"/element","port":"", "host":"","password":"","user":"","userInfo":"","authority":"","protocol":"","source":"/element","queryKey":{},"chunks":["element"]},"urlOriginal":"/session/e4b63b00-fe9c-11e6-a630-0fa086b5cd8d/element"}}

What I understand is that there is no element which name is "all-comments" So, find_element_by_id has failed.

And then, I tried some id or xpath to catch chat message

enter image description here

But nothing can get chat message.

Is there something what I do wrong way?

What I do to scrap chat message?

Donald Duck
  • 8,409
  • 22
  • 75
  • 99
Py K
  • 53
  • 1
  • 5
  • I guess the chat part is loaded via Javascript. If you download the html of that page, without processing it via browser, do this ids show up? Or just disable Javascript in the brower and have a look then – Martin Krung Mar 02 '17 at 12:48
  • In the youtube video, the python code does nothing with javascript but it works at least in the video. I think that the structure of youtube page could be changed. So, I tried several id, xpath and class names. – Py K Mar 02 '17 at 13:37
  • Currently, what i found is the chat messages exist in – Py K Mar 02 '17 at 14:10
  • @FabianThommen I use browser PhantomJS to get the page. I followed [http://stackoverflow.com/questions/32115673/how-to-disable-javascript-in-phantomjs-through-selenium-webdriver] to disable javascript. And then webdriver cannot find any element with id 'all-comment', 'comments', 'message', or 'live-chat-iframe'. – Py K Mar 03 '17 at 13:06
  • Without disabling javascript, I can get an element with id 'live-chat-iframe'. But I cannot find any element with id coments and message. Is there any way to list all elements under the element live-chat-iframe without id or class? – Py K Mar 03 '17 at 13:07

2 Answers2

1

you will never have access to the content of the iframe. this is by design. an iframe its like a browser in the browser.

see here https://en.wikipedia.org/wiki/Same-origin_policy

you have to read out src attribut from iframe, load this page and then filter it. this will work

Martin Krung
  • 1,098
  • 7
  • 22
  • Thanks for your help. From youtube livestream https://www.youtube.com/watch?v=SF7FUU7CThs, I can get src for live chat https://www.youtube.com/live_chat?continuation=0ofMyAMkGiBDZzhLRFFvTFUwWTNSbFZWTjBOVWFITWdBUSUzRCUzRDAB. But still I can't reach to the chat message. What elements I can see in chrome is differ from what I got using selenium. – Py K Mar 03 '17 at 19:12
  • I used function 'find_elements_by_xpath('//*')' for get all elements in the chat page https://www.youtube.com/live_chat?continuation=0ofMyAMkGiBDZzhLRFFvTFUwWTNSbFZWTjBOVWFITWdBUSUzRCUzRDAB. There are only 14 elements including head body style and div. – Py K Mar 03 '17 at 19:21
  • http://imgur.com/jmLAM6S This is the reason why I confused. Using selenium, I can get only 6 tagged elements under 'body'. There is no 'script', 'yt-live-app', and 'iframe' – Py K Mar 04 '17 at 07:54
  • By firepath in firefox and xpath helper in chrome, the xpath of chat message is ".//*[@id='message']". But I cannot apply this xpath. – Py K Mar 04 '17 at 08:10
-2

I know this thread is a bit old but I am able to scrape youtube live chat feeds using casperjs. It's a work in progress but you can get the gist here

https://github.com/archae0pteryx/yt-live-chat-scraper

rimraf
  • 3,925
  • 3
  • 25
  • 55