0

I have a task that I have to search within a particular web page and after searching, the result page will be shown, have to save them for further analyze in off line. I have many words to search for a particular webpage.

I want to develop such a program that will automatically send search request in the site and the result page will be saved in folder for further analysis in offline like analyzing by regular expression. I know only Java, J2EE and familiar with JavaScript.

I have seen some software in the internet but so far seen, they are not match with my requirements and moreover they are not free. But don't forget to suggest such free software or software with trial.

halfer
  • 19,824
  • 17
  • 99
  • 186
Black Swan
  • 813
  • 13
  • 35

2 Answers2

0

You should save the web response in a variable and after write it into a .txt file in some directory.

Then, you can work with yours .txt with regular expresions offline.

HttpComponents-client library from Apache is good for do this.

there is some example of a get request:

    public String httpGetSimple(String url){
    String source = null;

    HttpClient httpClient = HttpClients.createDefault();
    HttpGet httpGet = new HttpGet(url);
    try {
    HttpResponse httpResponse = httpClient.execute(httpGet);
        source = EntityUtils.toString(httpResponse.getEntity());
    } catch (IOException e) {
        e.printStackTrace();
    }
    return source;
}
Paplusc
  • 1,080
  • 1
  • 12
  • 24
0

If you are trying to do this with JavaScript in the browser, right now there is no way for a script to access the native file system to write files in the way you are talking about. There are some workarounds using Java, mentioned here: Can javascript access a filesystem?

If you just want to use JavaScript to do it, and it can run outside of the browser, like in the command line or on a server, you can use Node to do that pretty easily.

 var http = require('http');
 var fs = require('fs');

 http.get('http://www.google.com/index.html', (res) => {
   console.log(`Got response: ${res.statusCode}`);
   //read in the response data
   var body = "";
   res.on("data", function(chunk) {
     body += chunk;
   });
   //write the body of the file to a text file
   fs.writeSync('page.txt', body);
   res.resume();
 }).on('error', (e) => {
   console.log(`Got error: ${e.message}`);
 });
Community
  • 1
  • 1
J_Everhart383
  • 354
  • 1
  • 6