Highest Voted 'web-mining' Questions

16

votes

3 answers

Good dataset for sentiment analysis?

I am working on sentiment analysis and I am using dataset given in this link: http://www.cs.jhu.edu/~mdredze/datasets/sentiment/index2.html and I have divided my dataset into 50:50 ratio. 50% are used as test samples and 50% are used as train…

dataset sentiment-analysis web-mining

asked Jul 07 '14 at 08:04

user3512562

233
2
3
7

8

votes

5 answers

Fast internet crawler

I'd like to do perform data mining on a large scale. For this, I need a fast crawler. All I need is something to download a web page, extract links and follow them recursively, but without visiting the same url twice. Basically, I want to avoid…

python multithreading web-crawler web-mining

asked Oct 04 '11 at 19:51

pbp

1,461
17
28

3

votes

3 answers

Java API for web scraping or web mining

I'm looking for a good Java api to do web scraping. I tried WEB-Harvest api http://web-harvest.sourceforge.net/usage.php but I think it's a bit clunky. Any other suggestions?

java api screen-scraping web-mining

asked Mar 09 '11 at 18:29

finfinni

39
1
2

3

votes

1 answer

Python Mechanize - How to submit an unlisted value in dropdown menu

I am using Python's mechanize to add items into an Amazon shopping cart. On an item's product page, you select the Quantity in the form's dropdown menu and submit Add to Cart. The dropdown menu allows you to select Quantities from 1 through 30. The…

python html web-scraping mechanize web-mining

asked Sep 17 '14 at 20:12

Max

65
5

3

votes

1 answer

Programmatically look up a ticker symbol in R

I have a field of data containing company names, such as company <- c("Microsoft", "Apple", "Cloudera", "Ford") > company Company 1 Microsoft 2 Apple 3 Cloudera 4 Ford and so on. The package tm.plugin.webmining allows you to query data from…

r tm web-mining

asked Sep 02 '14 at 21:05

Hack-R

22,422
14
75
131

2

votes

6 answers

Web mining or scraping or crawling? What tool/library should I use?

I want to crawl and save some webpages as HTML. Say, crawl into hundreds popular websites and simply save their frontpages and the "About" pages. I've looked into many questions, but didn't find an answer to this from either web crawling or web…

java python web-crawler web-scraping web-mining

asked Oct 11 '11 at 07:48

Flake

4,377
6
30
29

2

votes

1 answer

Is there any web mining Library in Node.js for sentiment analysis?

I am doing sentiment analysis in Javascript using Node.js. I am looking for web mining packages in Node to clean a web page. Is there any built-in package for web mining in Node like we have in R tm.plugin.webmining Package? Thank you

javascript node.js package sentiment-analysis web-mining

asked Jun 19 '17 at 09:21

Neha chauhan

21
1

2

votes

2 answers

Scraping data from a dynamic ecommerce webpage

I'm trying to scrap the titles of all the products listed on a webpage of an E-Commerce site(in this case, Flipkart). Now, the products that I would be scraping would depend of the keyword entered by the user. A typical URL generated if I entered a…

python beautifulsoup python-requests web-mining

asked Sep 27 '14 at 23:08

Manas Chaturvedi

5,210
18
52
104

2

votes

1 answer

Any better pre processing library or implementation in python?

I need to pre-process some text documents so that I can apply classification techniques like fcm e.t.c and other topic modeling techniques like latent dirichlet allocation e.t.c To elaborate a bit in preprocessing I need to remove the stop words,…

python preprocessor nlp data-mining web-mining

asked Apr 23 '12 at 13:12

Kai

953
6
16
37

1

vote

1 answer

Difficulty in extracting main content from a news web page

I need to extract main contents (excluding links,advertisements,etc) from a news web page.I have read about it on web and came to know that to do that I need to parse html page and then select contents from html tags.I have written a code which…

java html-parsing text-extraction web-mining

asked Feb 17 '12 at 16:48

dark_shadow

3,503
11
56
81

1

vote

1 answer

How can I use scrapy on booking.com without being blocked?

I am trying to scrape hotel reviews from booking.com with the python plugin scrapy. My problem is, that the desired data (e.g. negative feedbacks) can't be found by scrapy. I think, it's because of the javascript code embedded in the…

python scrapy web-crawler web-mining

asked Mar 06 '21 at 19:22

Julia

11
1

1

vote

0 answers

Craw data from urls by passing URL to Scrapy from other *.py file

I'm using Scrapy to craw data from website, and this is my code at file spider.py in folder spider of Scrapy class ThumbSpider(scrapy.Spider): userInput = readInputData('input/user_input.json') name = 'thumb' # start_urls =…

python scrapy data-science web-mining

asked Jun 14 '20 at 14:07

Claire Duong

103
1
7

1

vote

1 answer

How to get text and href value in anchor tag with scrapy, xpath, python

I have a HTML file like this:

In the folder spiders, I have a file jokes.py like this: import scrapy from…

python web-scraping scrapy web-mining

asked Jun 12 '20 at 08:02

Claire Duong

103
1
7

1

vote

1 answer

Rcrawler - How to crawl account/password protected sites?

I am trying to crawl and scrape a website's tables. I have an account with the website, and I found out that Rcrawl could help me with getting parts of the table based on specific keywords, etc. The problem is that on the GitHub page there is no…

r web-scraping web-crawler web-mining rcrawler

asked Jul 09 '18 at 10:56

Tasos Dalis

11
5

1

vote

0 answers

Twitter streaming API, where to find originator's name?

I am using Python to stream Twitter's Tweets via API. For example, the word "car" generates the following results: { "created_at": "Fri Sep 05 00:15:32 +0000 2014", "id": 507683414255108096, "id_str": "507683414255108096", "text": "I…

python-2.7 twitter social-media tweetstream web-mining

asked Sep 05 '14 at 00:27

KubiK888

4,377
14
61
115

Questions tagged [web-mining]