3

I am doing my university project on python. I am new to python. I have been given the below project

Build a classifier that predicts whether a restaurant review is positive or negative, based only on the text. Use reviews from TripAdvisor. Winning team gets a bonus.

Now in this project i have extracted the data from Tripadvisor but can someone please help me on how do it classify them ? i did not understand much in class so can someone please tell me a good video tutorial where i can learn this classification

Thanks in advance Rob

  • You will probably need to use [NTLK Sentiment Analysis package](http://text-processing.com/demo/sentiment/). In particular, read [the first linked article](http://streamhacker.com/2010/05/10/text-classification-sentiment-analysis-naive-bayes-classifier/), as it explains the process step by step. If you still don't understand the theory after that, you should probably talk to your teacher. If you get stuck on the implementation, come back here with a specific question. As it stands, the question will be closed either for asking for recommendations or for being too broad. – Amadan Oct 16 '14 at 00:46
  • related: [Sentiment analysis for Twitter in Python](http://stackoverflow.com/q/573768/4279) – jfs Oct 16 '14 at 02:08
  • related: [Twitter sentiment analysis technics](http://stackoverflow.com/q/13713817/4279) – jfs Oct 16 '14 at 02:09

1 Answers1

2

I see the following step

  1. Fetch data from TripAdvisor
  2. Analyse data and extract name of restaurant and run NLTK Naive Bayes Classification on the test reviews.

This can be done in many ways, I hope you are a fast learner because this is pretty hard if you are not an expirienced coder. But go get scrapy, this will be your tool of choise for such an assignment. This is a hard one, but scrapy has very good docs and tutorials. But if you are not a experienced coder, this will take some time.

Scrapy can also help you process the data (html), you have to extract the name of the resurant and run NLTK Naive Bayes Classification on the reviews

Last but not least you have to use a scrapy pipeline to save the data. I will suggest sqlite for your project..

Feel free to ask questions if you need to, but make them count. We can not do your project for you.. But we can certainly help you in the right direction and help you with some of the coding issue.. But try you best before asking, we hate lazy people that dont try for them self and research before asking ;)

Best of luck with your project and welcome to Stackoverflow.

brunsgaard
  • 5,066
  • 2
  • 16
  • 15
  • Thanks for the prompt reply. I have done the extracting of data (reviews and ratings)into a text file. Now i have to classify the data.so how do you recommend i should go ? – user3930701 Oct 16 '14 at 00:56
  • I have to classify the data into positive and negative .. so i have to give how many number of positive and how many number of negative reviews are there – user3930701 Oct 16 '14 at 00:58
  • Okay, in that case NLTK Naive Bayes Classification is not neede you can extract the data directly from the html – brunsgaard Oct 16 '14 at 00:59
  • @user3930701, how much time do you have to finish this project? The crawling of the site aka data collection is not a small task. – brunsgaard Oct 16 '14 at 00:59
  • i have already extracted the data ... what should i do about classification .....to User i got 3 weeks for this project ... my prof has taught me knn classification but i dont really understand that ... so i looking to start something from scrach – user3930701 Oct 16 '14 at 01:04
  • Okay, you should write that in your post, that you have the data and want recources for knn classification. The must be like hundreds of explainations on the web of knn, it is not that hard and pretty basic. Why not start here :) https://www.youtube.com/watch?v=4ObVzTuFivY Would you please upvote my answer if you like it ;) Or this one https://www.youtube.com/watch?v=09mb78oiPkA – brunsgaard Oct 16 '14 at 01:08
  • sorry i cannot vote you because i need 15 reputation to vote up. I am new to stackoverflow ....i dont need to understand knn algoritm ... he has given the code for algorithm .. but he said we are free to expriment so i dont want to do with knn algoritm ... anything else would be fine ? – user3930701 Oct 16 '14 at 01:14
  • I would just start by counting upvotes vs downvotes.. Can you post the dataset you have anywhere? – brunsgaard Oct 16 '14 at 01:25
  • i cannot just count the upvote and downvote as i i have to do it based on positive and negative words ... i have got a list of positive and negative words too i have more than 80000 reviews but here is just a glimpse of data .. Rating and review 5 We LOVE this deli...both locations! It is the very first and very last place we hit when we are in NYC. If we weren't watching our waistlines we would eat every meal there! Our kids love it... – user3930701 Oct 16 '14 at 01:33
  • put it in a gist :) And post a link – brunsgaard Oct 16 '14 at 01:33
  • can you view this file https://gist.github.com/shanewatsonweb/265aa14e38165e046069 – user3930701 Oct 16 '14 at 01:43
  • uploaded three flies reviews positive and negative – user3930701 Oct 16 '14 at 01:49