I'm trying to scrape some data from a google search following this guide. Running the following in a simple Node.js script works:
const axios = require("axios");
const getGymOperatingHours = async (search) =>
axios
.get(`https://www.google.com/search?q=${search}&gl=sg&hl=en`, {
headers: {
"User-Agent":
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.64 Safari/537.36 Edg/101.0.1210.47',
},
})
.then(({ data }) => {
//process data
});
});
However, doing the exact same thing in my React app yields some errors: (viewed on my Google Chrome browser)
xhr.js:164 Refused to set unsafe header "User-Agent"
With and without trying to set the header also yields this error:
Access to XMLHttpRequest at 'https://www.google.com/search?q=Ark%20Bloc&gl=sg&hl=en' from origin 'http://localho.st:3000' has been blocked by CORS policy: No 'Access-Control-Allow-Origin' header is present on the requested resource.
However, certain URLs work fine, like this one:
export const getGymOperatingHours = async (name: string) => {
return (
axios
.get(`https://static.wixstatic.com/media/04ac92_ab3411c5de584541aa237c4cf2a82093~mv2.jpg/v1/fill/w_2880,h_1196,al_c,q_90,usm_0.66_1.00_0.01,enc_auto/04ac92_ab3411c5de584541aa237c4cf2a82093~mv2.jpg`)
.then(({ data }) => {
//process data
})
);
};
I tried using both http://localhost:3000
and http://localho.st:3000
as per this thread, but frankly I completely don't understand any of this to know what I'm doing.
Is this CORS thing something on Google's side (and most other sites I tried) that's blocking requests from domains named localhost:3000
? Is setting User-Agent
in the header supposed to be a workaround for this, in which case I should try to figure out how to set the header?