0

Okay, so here's the thing: All of you are probably thinking the same thing: you can use

driver.getPageSource();

And this is partially true. The only issue is that the source code gets compiled in a rather strange way where all through the code

\"

starts showing up. I tried removing this manually but that still doesnt fix the problem completely.

One example of what I mean:

normal source code:

\"query_title\":null}",encoded_title:"WyJoZW5rIl0",ref:"unknown",logger_source:"www_main",typeahead_sid:"",tl_log:false,impression_id:"bbdb1882",filter_ids:

Selenium output:

\\\"query_title\\\":null}\",\"encoded_title\":\"WyJoZW5rIl0\",\"ref\":\"br_tf\",\"logger_source\":\"www_main\",\"typeahead_sid\":\"0.6583900225217523\",\"tl_log\":false,\"impression_id\":\"e00060b4\",\"filter_ids\"

It seems to be the same type of thing as where you have to put something in front of certain symbols in quotes, to stop java from seeing it as one of those symbols, but I don't fully understand this behaviour, and have no idea how to fix it... hope you can help :)

edit: replacing doesn't work because of the way this got compiled. An example of why it won't work is actually in the example I included earlier:

original:

}",encoded_title:

compiled version:

}\",\"encoded_title\":

Replacing \" with " would change it in to:

}","encoded_title":

which differs from the original...

And if I were to replace \" with nothing, I would get:

},encoded_title:

which, sadly, still differs from the original. The way this is compiled I just don't think replacing is a viable option...

2 Answers2

1

You can use javascript to get html using outerHTML or innerHTML (How do I get the HTML source from the page?):

((JavascriptExecutor) driver).executeScript("return document.documentElement.outerHTML;")
((JavascriptExecutor) driver).executeScript("return document.documentElement.outerHTML;")
((JavascriptExecutor) driver).executeScript("return document.all[0].outerHTML")
((JavascriptExecutor) driver).executeScript("return new XMLSerializer().serializeToString(document);")
Sers
  • 12,047
  • 2
  • 12
  • 31
0

You can use Java String Class replaceAll method to replace unwanted characters with the character you want.

OLD solution -

 driver.getPageSource().replaceAll("\\"", "\"").replaceAll("\\\\", ""));

New approx solution - As page source can contain anything in HTML

public class CheckString {


    static String str = "\\\\\\"query_title\\\\\\":null}\\",\\"encoded_title\\":\\"WyJoZW5rIl0\\",\\"ref\\":\\"br_tf\\",\\"logger_source\\":\\"www_main\\",\\"typeahead_sid\\":\\"0.6583900225217523\\",\\"tl_log\\":false,\\"impression_id\\":\\"e00060b4\\",\\"filter_ids\\"";

    public static void main(String[] args) {

    System.out.println(str.replaceAll("\\\\",","\",")
                          .replaceAll(":\\\\"", ":\"")
                          .replaceAll("\\\\"","")
                          .replaceAll("\\\\\\\\", "\\\\\""));

    }

}

OutPut -

\"query_title\":null}",encoded_title:"WyJoZW5rIl0",ref:"br_tf",logger_source:"www_main",typeahead_sid:"0.6583900225217523",tl_log:false,impression_id:"e00060b4",filter_ids

Note - In earlier approach I forgot to escape & character which is used by replaceAll function to separate multiple condition in regex

Amit Jain
  • 4,389
  • 2
  • 18
  • 21
  • Thanks for your contribution, I will try this option now! –  Aug 19 '18 at 20:39
  • I edited the question, to explain why your answer won't work –  Aug 19 '18 at 20:59
  • Solution is not recommended for full page source but this works for the string you have given. Thanks. – Amit Jain Aug 19 '18 at 22:07
  • Sadly, that will not suffice. I need loads of different parts of the source code, so I require a fix to rectify the entire source code. –  Aug 19 '18 at 22:11
  • I dont think so that there will be fix in source code because in client side html, the html reserved tags if used in page are replaced in the way it is appearing in getPageSource method output ... e.g < ,> are replaced with &lt,&gt Please share what ever solution you find for this. Thanks – Amit Jain Aug 19 '18 at 22:19