While building a crawl server, I asked a question because I couldn't solve the problem of a 403 error occurring in the distribution environment, which worked normally in the local environment.
You may be busy, but please take a look and give me feedback.
I'm so frustrated because I haven't been able to solve it for days.
environment
- GKE
- Java, Spring boot 3.0.7
- selenium
Error
[http-nio-8084-exec-1] [2023-08-05 23:40:28,741] [ERROR] [SolvedCrawling.java:136] - <html><head><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>ERROR: The request could not be satisfied</title>
</head><body>
<h1>403 ERROR</h1>
<h2>The request could not be satisfied.</h2>
<hr noshade="" size="1px">
Request blocked.
We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
<br clear="all">
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
<br clear="all">
<hr noshade="" size="1px">
<pre>Generated by cloudfront (CloudFront)
Request ID: -ok_4ED_rixCpeCLsp6ytvEtEjyMwZSMUb2VxVa10USNMDizAGzXbg==
</pre>
<address>
</address>
</body></html>
Code
- RestTemplate
@Bean
public RestTemplate restTemplate() {
RestTemplate restTemplate;
try {
restTemplate = new RestTemplate(clientHttpRequestFactory());
}catch (Exception e){
restTemplate = new RestTemplate();
}
restTemplate.setInterceptors(Collections.singletonList(
(request, body, execution) -> {
HttpHeaders headers = request.getHeaders();
headers.setContentType(APPLICATION_JSON);
headers.setAccept(Collections.singletonList(APPLICATION_JSON));
headers.add(HttpHeaders.ACCEPT, "application/json");
headers.add(HttpHeaders.USER_AGENT, "Mozilla/5.0");
return execution.execute(request, body);
}
));
return restTemplate;
}
private HttpComponentsClientHttpRequestFactory clientHttpRequestFactory() {
return new HttpComponentsClientHttpRequestFactory();
}
- API Request
public String getSubject(int problemId) throws Exception{
String jsonString = null;
try {
jsonString = restTemplate.getForObject("https://solved.ac/api/v3/problem/show?problemId=" + problemId, String.class);
}catch (Exception e){
e.printStackTrace();
throw new HttpResponseException("fail.");
}
JSONParser jsonParser = new JSONParser();
Object jsonObject = null;
try {
jsonObject = jsonParser.parse(jsonString);
} catch (ParseException e) {
e.printStackTrace();
}
JSONObject jsonBody = (JSONObject) jsonObject;
return jsonBody.get("titleKo").toString();
}
- Crawling
public BaekJoonDto profileCrawling(String baekjoonId) throws IOException, InterruptedException {
WebDriver driver = setDriver();
sleep(1000);
driver.get(SOLVED_BASE_URL + SOLVED_PROFILE + baekjoonId);
By solvedListBy = By.xpath("//*[@id=\"__next\"]/div[3]/div/div[6]/div[3]/div/table/tbody");
sleep(1000);
try {
wait(driver, solvedListBy);
} catch (TimeoutException | NoSuchElementException e) {
log.error("{}", driver.getCurrentUrl());
log.error("{}", driver.getPageSource());
throw new CrawlingException("User NotFound.");
}
WebElement elements = driver.findElement(solvedListBy);
WebElement webElement = elements.findElement(By.className("css-1ojb0xa"));
int bronze = getUserSolvedCount(webElement, By.xpath("//*[@id=\"__next\"]/div[3]/div/div[6]/div[3]/div/table/tbody/tr[1]/td[2]/b"));
driver.quit();
return new BaekJoonDto(bronze);
- driver setting
private WebDriver setDriver() throws IOException, InterruptedException {
String os = System.getProperty("os.name").toLowerCase();
if (os.contains("win")) {
System.setProperty("webdriver.chrome.driver", "drivers/chromedriver_win.exe");
} else if (os.contains("mac")) {
Process process = Runtime.getRuntime().exec("xattr -d com.apple.quarantine drivers/chromedriver_mac");
process.waitFor();
System.setProperty("webdriver.chrome.driver", "drivers/chromedriver_mac");
} else if (os.contains("linux")) {
System.setProperty("webdriver.chrome.driver", "/usr/bin/chromedriver-linux64/chromedriver");
}
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--disk-cache-size=0");
chromeOptions.addArguments("--media-cache-size=0");
chromeOptions.addArguments("--headless=new");
chromeOptions.setHeadless(true);
chromeOptions.addArguments("--no-sandbox");
chromeOptions.addArguments("--disable-dev-shm-usage");
chromeOptions.addArguments("--disable-gpu");
chromeOptions.addArguments("--remote-allow-origins=*");
chromeOptions.addArguments("--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537");
// binary setting in local
// chromeOptions.setBinary("/Applications/Google Chrome.app/Contents/MacOS/Google Chrome"); local
//. binary setting in deploy
chromeOptions.setBinary("/usr/bin/google-chrome");
return new ChromeDriver(chromeOptions);
}
- user-agent setting