I'm doing web scraping using Headless Chrome (Selenium Chrome Web driver) installed on Ubuntu on EC2. For small number of requests, it's working fine.. but when are large number of simultaneous requests (hundreds) are fired, they kept crashing down and i always had to restart the server.
Has anyone used this to support large load? I am using t2.medium ec2 server.
In the logs, I see Error Communicating with Browser:
2018-12-20 19:18:04.565 ERROR 1292 --- [io-8080-exec-79] o.s.boot.context.web.ErrorPageFilter : Forwarding to error page from request [/v1.0/search] due to exception [Error communicating with the remote browser. It may have died.
Build info: version: '3.11.0', revision: 'e59cfb3', time: '2018-03-11T20:26:55.152Z'
System info: host: 'ip-172-31-17-81', ip: '172.31.17.81', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1072-aws', java.version: '1.8.0_191'
Driver info: driver.version: RemoteWebDriver
Capabilities {acceptInsecureCerts: false, acceptSslCerts: false, applicationCacheEnabled: false, browserConnectionEnabled: false, browserName: chrome, chrome: {chromedriverVersion: 2.37.544315 (730aa6a5fdba15..., userDataDir: /tmp/.org.ch
romium.Chromium...}, cssSelectorsEnabled: true, databaseEnabled: false, handlesAlerts: true, hasTouchScreen: false, javascriptEnabled: true, locationContextEnabled: true, mobileEmulationEnabled: false, nativeEvents: true, networkConnectio
nEnabled: false, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, rotatable: false, setWindowRect: true, takesHeapSnapshot: true, takesScreenshot: true, unexpectedAlertBehaviour: , unhandledPromptBehavior: , version: 69.0.3
497.100, webStorageEnabled: true}
Session ID: e47c3c443164cbd2a3586ee6321d26f8]
org.openqa.selenium.remote.UnreachableBrowserException: Error communicating with the remote browser. It may have died.
Build info: version: '3.11.0', revision: 'e59cfb3', time: '2018-03-11T20:26:55.152Z'
System info: host: 'ip-172-31-17-81', ip: '172.31.17.81', os.name: 'Linux', os.arch: 'amd64', os.version: '4.4.0-1072-aws', java.version: '1.8.0_191'
Driver info: driver.version: RemoteWebDriver
Capabilities {acceptInsecureCerts: false, acceptSslCerts: false, applicationCacheEnabled: false, browserConnectionEnabled: false, browserName: chrome, chrome: {chromedriverVersion: 2.37.544315 (730aa6a5fdba15..., userDataDir: /tmp/.org.ch
romium.Chromium...}, cssSelectorsEnabled: true, databaseEnabled: false, handlesAlerts: true, hasTouchScreen: false, javascriptEnabled: true, locationContextEnabled: true, mobileEmulationEnabled: false, nativeEvents: true, networkConnectio
nEnabled: false, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, rotatable: false, setWindowRect: true, takesHeapSnapshot: true, takesScreenshot: true, unexpectedAlertBehaviour: , unhandledPromptBehavior: , version: 69.0.3
497.100, webStorageEnabled: true}
Session ID: e47c3c443164cbd2a3586ee6321d26f8
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:566)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:602)
at org.openqa.selenium.remote.RemoteWebDriver.quit(RemoteWebDriver.java:445)
at mt.service.KlookServiceImpl.searchTrips(KlookServiceImpl.java:138)
at mt.controller.TripSearchController.getTripResults(TripSearchController.java:120)
at sun.reflect.GeneratedMethodAccessor163.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:137)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandleMethod(RequestMappingHandlerAdapter.java:776)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:705)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:959)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:893)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:967)
...
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:505)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:169)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:956)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:436)
at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1078)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:625)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException
at org.openqa.selenium.net.UrlChecker.waitUntilUnavailable(UrlChecker.java:145)
at org.openqa.selenium.remote.service.DriverService.stop(DriverService.java:214)
at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:95)
at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:545)
... 58 common frames omitted
Caused by: java.util.concurrent.TimeoutException: null
at java.util.concurrent.FutureTask.get(FutureTask.java:205)
at com.google.common.util.concurrent.SimpleTimeLimiter.callWithTimeout(SimpleTimeLimiter.java:156)
at org.openqa.selenium.net.UrlChecker.waitUntilUnavailable(UrlChecker.java:115)
... 61 common frames omitted
There is also Java Heap Space issue:
2018-12-20 19:18:20.294 ERROR 1292 --- [io-8080-exec-84] o.s.boot.context.web.ErrorPageFilter : Forwarding to error page from request [/v1.0/search] due to exception [Java heap space]
java.lang.OutOfMemoryError: Java heap space
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "http-bio-8080-exec-96"
When this error occurred, I needed to restart the server. Then everything will be back normal for small number of runs, but the problem repeats when there's larger loads.