11

I'm crawling a secure website which blocks me whenever I restart my crawler application(I need to change IP as a trick). I solved this issue by using default user profile in chrome driver like this (I'm using C# right now, but I can switch to java if needed):

ChromeOptions options = new ChromeOptions();
options.AddArguments($"user-data-dir=C:/Users/{Environment.UserName}/AppData/Local/Google/Chrome/User Data/Default");

It saves all sessions and cookies and restore them when restarting my application. Everything works as expected.

Now, I need to change my webdriver to PhantomJS for some reasons.

My question How can I make this scenario possible using PhantomJS: login to an account(like gmail or facebook), close my application and driver, find myself logged-in the next time I run the application and driver. In other words, how can I use the same session for PhantomJS at each run?

Try 1 (in C#):

After doing some search, I found that this can be done using local storage and cookies file arguments in PhantomJS. Now the problem is that local storage path is always empty and nothing is saved there(I navigate to multiple sites but still empty), therefore, I can't use the session from previous execution. My code to set local storage and cookies file is simple as below:

PhantomJSDriverService service = PhantomJSDriverService.CreateDefaultService();
service.LocalStoragePath = Application.StartupPath + "\\default";
service.CookiesFile = Application.StartupPath + "\\default\\Cookies";
IWebDriver driver = new PhantomJSDriver(service);

What is wrong with my approach?

Try 2 (in C#):

Based on @SiKing answer and comment discussions, I changed to below code(using AddArgument) but the directory is still empty:

string localStoragePath = Path.Combine(Path.GetTempPath(),"PhantomLocalStorage-");

if (!Directory.Exists(localStoragePath))
{
    Directory.CreateDirectory(localStoragePath);
}

PhantomJSDriverService service = PhantomJSDriverService.CreateDefaultService();
service.AddArgument("--local-storage-quota=5000");
service.AddArgument("--local-storage-path=" + localStoragePath);
IWebDriver driver = new PhantomJSDriver(service);

Try 3 (in java):

Directory is still empty:

DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
List<String> cliArgs = new ArrayList<String>();
Path local_storage_path = Paths.get(System.getProperty("java.io.tmpdir") + "PhantomLocalStorage-");
if (Files.notExists(local_storage_path)) {
    try {
        Files.createDirectory(local_storage_path);
    }
    catch (IOException e) {
        JOptionPane.showConfirmDialog(null, "Can Not Create Path");
    }
}
cliArgs.add("--local-storage-quota=5000");
cliArgs.add("--local-storage-path=" + local_storage_path.toString());
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgs);
WebDriver driver = new PhantomJSDriver(capabilities);
Efe
  • 800
  • 10
  • 32
  • It is unclear, what exactly you mean by "this behavior". Using PhantomJS you can save and restore cookies, local storage, WebSQL content: http://phantomjs.org/api/command-line.html – a-bobkov Aug 05 '17 at 19:31
  • It means saving all user data to a directory during browsing, and restoring them from the desired directory on startup. An example : log in to a gmail account with phantomjs web driver that is controlled by a simple .net application, then close the application and web driver(without logging out gmail). The next time you start the application(also webdriver) and navigate to https://www.gmail.com, you should allready be signed in. This is the default behavior of chrome when launching by a profile, just like the above code. I have tested -cookies-file as the link says, but did not work for me. – Efe Aug 05 '17 at 20:06

2 Answers2

4

PhantomJS by default starts with no local-storage; see this discussion.

To enable local-storage via Selenium I have used the following Java code. Sorry, it has been too long since I have used C#, but I am confident the C# bindings have similar methods available.

DesiredCapabilities capabilities = DesiredCapabilities.phantomjs();
// Phantom options can only be set from CLI
List<String> cliArgs = new ArrayList<String>();
cliArgs.add("--local-storage-quota=5000");
Path local_storage_path = Files.createTempDirectory("PhantomLocalStorage-");
cliArgs.add("--local-storage-path=" + local_storage_path.toString());
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgs);
WebDriver driver = new PhantomJSDriver(capabilities);

Note that local_storage_path will not be deleted after you are done with it. If you need that, you can set up a hook as per this post. But I suspect in C# this part is going to be wildly different from Java.

SiKing
  • 10,003
  • 10
  • 39
  • 90
  • I have a problem. since never user desired capabilities, how do I add cliArgs to capabilities and pass it to driver? – Efe Aug 10 '17 at 21:32
  • Here in C#, we have SetCapability method for capabilities object as follow: SetCapability(string capability, object capabilityValue). Is this the name and value that you have added to cliArgs list?? If so, how do I set capabilities to PhantomJsDriver?? – Efe Aug 10 '17 at 21:56
  • 1
    @Efe I added two more lines to my sample code sample. Sorry about that. – SiKing Aug 10 '17 at 22:03
  • We are almost getting to the point. Edit helped a lot. Thanks. The only problem is that there is no PhantomJSDriverService.PHANTOMJS_CLI_ARGS and no PhantomJSDriver constructor accepting DesiredCapabilities object in C#. I'm trying to find it. – Efe Aug 10 '17 at 22:14
  • For the PHANTOMJS_CLI_ARGS constant, you can just use the string `"phantomjs.cli.args"`. – SiKing Aug 10 '17 at 22:21
  • For the Driver constructor, maybe you could use this https://seleniumhq.github.io/selenium/docs/api/dotnet/?topic=html/Overload_OpenQA_Selenium_PhantomJS_PhantomJSDriverService_AddArguments.htm and then pass that to the constructor? – SiKing Aug 10 '17 at 22:24
  • Based on your recent link, I edited the question. Please check Try 2 in question, which seems the same as Try 1. I think if we can match the driver with DesiredCapabilities, the problem will be solved. But I am curious. I will test your code in java to see if it works. Please continue your suggestions. They are useful. – Efe Aug 10 '17 at 23:08
  • I just executed your code in java as Try 3(I edited the question again). Still getting empty directory. since your code creates a temp folder with different name each time, I changed it in a way to be the same folder each time. – Efe Aug 11 '17 at 00:31
  • @Efe So you have a way to create a directory, and a way to assign it to Phantom's local storage. Is there still a problem? – SiKing Aug 11 '17 at 15:51
  • yes, the problem is no user data is stored in directory. I cant launch the driver with the same session. The directory is empty. Seems driver does nothing with it. – Efe Aug 11 '17 at 16:36
  • @Efe So that would seem like local-storage is not utilized by this site? – SiKing Aug 11 '17 at 20:20
  • I don't think so. Any website with authorization uses client side data. As far as I know, local-storage is for holding client side data. I am confused, because you said your code worked for you. I tested your code with many sites but no file in local-storage. The site I'm crawling contains 3 important cookies. Let me clarify. Once I log in to site,shut down without loging out, the next time it says you have logged in another computer. This shows that PhantomJs starts with new session each time, while this problem is solved by ChromeDriverOptions default profile as stated in my question. – Efe Aug 11 '17 at 20:32
  • This is getting outside of my comfort zone, but I think cookies are not held in local-storage. They are held by the browser in some other area. – SiKing Aug 11 '17 at 21:27
  • We can set cookies file using --cookies-file, but saved data does not make any sense. As you say, I try to find where cookies are saved for PhantomJs. – Efe Aug 11 '17 at 21:34
  • Since you worked a lot on this question, I give you the bounty reputation. But problem is still not solved. I hope we can see bounty question as nice questions, not reputations. Thanks for your efforts. wish to hear more from you. – Efe Aug 12 '17 at 09:57
1

For the cookies part you could just save them from one session:

ReadOnlyCollection<Cookie> cookiesFromFirstSession = driver.Manage().Cookies.AllCookies;

And load them into another webdriver session:

driver.Manage().Cookies.AddCookie();

I would guess that it is the cookies, not necessarily any other local storage data, that you need in order to keep the same session.

Agent Shoulder
  • 586
  • 8
  • 22
  • I have done this,but everytime, I get new session. I also get coockies from chromedriver(which preserves the session) and set them to PhantomJs, but still the same – Efe Aug 11 '17 at 08:40