-1

I am doing a mini-project to apply clustering/retrieval ML methods on youtube video data to return some specific videos that I desire from a large dataset of videos.

I am having some trouble figuring out how to get the dataset of youtube videos in the first place. End goal is I want something like:

video_id | video_title | category | likes | dislikes | views | words_comment |

for bunch of videos (maybe ~10000 rows?) in csv format that I could apply Python machine learning algorithms to.

What's the best way of going about this? I've tried the youtube API but I am not familiar how it works I am stuck with errors. Is scraping directly from the youtube website easier?

Thanks!

rbae
  • 73
  • 6
  • If you want to scrape, [this](https://stackoverflow.com/questions/44974870/obtaining-the-number-of-comments-of-a-list-of-youtube-videos/44979655#44979655) might give you some idea. Or if you want to use YouTube API v3, go through [this](https://stackoverflow.com/questions/30506031/get-youtube-video-info-with-new-apis-v3). Is scraping YouTube even legal? Take a look at [this](https://stackoverflow.com/questions/30587390/what-is-the-legality-of-scraping-youtube-data). – arif Jul 08 '17 at 22:15
  • Where to find the youtube API Key? check [this one](https://stackoverflow.com/questions/44399219/where-to-find-the-youtube-api-key) out. – arif Jul 08 '17 at 22:19

1 Answers1

0

You can try scraping if you think it is easier but if you are not familiar with api calls you may not find scraping easier. I would research working with api's. They are a little funky at first but they are not very hard to use once you get the hang of them.

There is a channel on youtube called thenewboston and I believe he has some api stuff and general python videos will help as api responses will be formatted in data types similar to python.

Joe
  • 2,641
  • 5
  • 22
  • 43
  • After wrestling with the youtube api some more, I think I more or less got the hang of it. – rbae Jul 09 '17 at 19:04