2

Many of the questions asked here are relevant to research I'm doing. These questions and answers are widely dispersed and not always easy to find, doing manual browsing, and sometimes an insightful answer or comment occurs in unrelated topics as well.

I want to automate finding these relevant Q's & A's, based on sets of keywords, then use the information as pointers towards further in-depth research.

What tools, preferably open-source, are available that I can use for this type of site-mining? I am not a web guru & for me to try to develop them will take a long time and also impact on time I could have spent on my R&D.

slashmais
  • 7,069
  • 9
  • 54
  • 80

4 Answers4

1

Another option would be using Yahoo! Pipes. (demo)

You can build such system visually online using a combination of feed urls, filters, etc... Learning time is minimal compared to programming. [edited: tense]

tamersalama
  • 4,093
  • 1
  • 32
  • 35
  • _Sounds_ cool - unfortunately doesn't seem to work well? http://www.jumpcut.com/view/?id=594F555C568011DC9D24000423CEF5B0 - looks like death: black & no sound – slashmais Oct 03 '08 at 06:42
  • Youtube is your friend Try this one - http://www.youtube.com/watch?v=d3h6ROs__II – tamersalama Oct 03 '08 at 15:26
1

It is not clear from your question whether you are a programmer or not, so I'm not sure whether you are after tools in the sense of apps or services that to what you want, or a library that makes site-mining easier.

If the latter is the case and you use ruby, I can thoroughly recommend WWW::Mechanize. It provides a nice API for writing scripts to search web pages (by DOM or by text), follow links, and fill out forms. I've used it several times to organise information that's spread over several web pages within a site.

I believe the ruby version was based on an earlier library for perl but I can't vouch for the perl version it I've not used it.

Mark Reid
  • 921
  • 1
  • 6
  • 14
  • The perl modules looks like the ticket. (I don't know ruby.) I'm going to google if someone has done what I need, else I'll write my own. Thanks, this was helpful. – slashmais Oct 04 '08 at 09:17
0

Human interaction tools might be useful in such case (no development cost, probably a more consistent outcome, and evolving requirements).

Couple comes to mind:

tamersalama
  • 4,093
  • 1
  • 32
  • 35
  • (I always thought that doing web-mining for others would be a could business call). I am a lone, private individual without the capital resources to pay others to do this; for me it's the hard-way or no-way. :-( – slashmais Oct 03 '08 at 06:29
0

All of the tags based on keywords have RSS feeds attached to them, so I'd start by subscribing to relevant keywords and searching the data. It seems like the simplest way to find related concepts and other related keywords.

btw
  • 7,006
  • 9
  • 40
  • 40
  • Much of the relevant info I found was unrelated to the tags on the questions; they were keywords within the texts of the answers. – slashmais Oct 03 '08 at 06:32