Questions tagged [apify]

Apify is a service to run docker images in the cloud. It is primarily used for web scraping and crawling with headless Chrome and Puppeteer, but can handle a wide variety of tasks. Apify also maintains Apify SDK, an open-source library for web scraping and crawling in JavaScript.

There are several ways to use Apify:

  • Visit the Apify Store to find existing software that suits your needs.
  • Join the Apify App and build your own serverless programs.
  • Use the open-source Apify SDK to create web scraping and automation scripts to run locally or on Apify.
200 questions
4
votes
1 answer

In Apify, how do I log to the console from within a nested function?

From the Apify example docs, I can see that if you use console.log() within handlePageFunction, it is logged directly to the terminal console. The same is true for the Apify utils log. For example: handlePageFunction: async ({ request, page }) => { …
Kit Johnson
  • 541
  • 2
  • 7
  • 15
3
votes
2 answers

How to use Apify to log in to a site and click a button?

I need use Apify and Zapier to automate i) logging-in to a password-protected web page and ii) clicking a button. How can I do this? I think I should be using Puppeteer in Actor, but I'm not certain how. Target URLs will change from time to time.…
Robert Andrews
  • 1,209
  • 4
  • 23
  • 47
3
votes
1 answer

Crawl urls from sitemap.xml using Apify Puppeteer and requestQueue

Apify can crawl links from sitemap.xml const Apify = require('apify'); Apify.main(async () => { const requestList = new Apify.RequestList({ sources: [{ requestsFromUrl: 'https://edition.cnn.com/sitemaps/cnn/news.xml' }], }); …
Ben W
  • 2,469
  • 1
  • 24
  • 24
2
votes
1 answer

Error 403 when trying to scrape a pages title using apify web-scraper actor

I am trying to use apify to get a websites title but when I run the code I get error 403, anyone know a fix? My Code: currentLink = "https://medium.com/vice/scientists-monitored-631-people-as-they-died-this-is-what-they-found-2de48ad9ed96"; const…
GomezStriker
  • 185
  • 10
2
votes
1 answer

Sessions and concurrency and how they are related

I'm building a PuppeteerCrawler and I have to login to a certain website. But the website doesn't allow for multiple browsers to be using the same account at the same time. From my understanding, the session is persisted to a single IP, but how can…
Emma Alecrim
  • 313
  • 1
  • 11
2
votes
2 answers

How can I make a search term in Apify a variable using Google App Script

I am trying to change the search query in the Apify Google Search Scraper using Google App Script by making the search term a variable. https://apify.com/apify/google-search-scraper I am trying to see if i can reference it by its code.…
NoUsername9
  • 77
  • 12
2
votes
0 answers

scrapeAndClick function in APIFY

I have a following trouble in APIFY. I would like to write a function that saves HTML body of a current page and then click to the next page, saves HTML body etc. I tried this: var result = []; var scrapeAndClick = function() { …
jan novotný
  • 118
  • 1
  • 8
2
votes
1 answer

Is there a way to use Apify.main() without it exiting the node.js process on completion?

I'm using the Apify SDK in my app, and have written a number of scrapers using the Apify.main() function. The final action of main() is to exit the node process, but this does not suit my purposes. Is there any way to over-ride this behavior?
Rusty
  • 609
  • 1
  • 4
  • 19
2
votes
1 answer

How to implement Apify webhooks?

Need help to implement Apify webhook. It takes some time to complete a task. I want to add a Apify webhook which will run another task but not sure how to do that. $.ajax({ url :…
Ben Jonson
  • 565
  • 7
  • 22
2
votes
3 answers

Best way to push one more scrape after all are done

I have following scenario: My scrapes are behind a login, so there is one login page that I always need to hit first then I have a list of 30 urls that can be scraped asynchronously for all I care then at the very end, when all those 30 urls have…
2
votes
2 answers

Puppeteer $.eval selecting nested elements

Lets say i'm give situation like this page
2
votes
1 answer

Apify: Preserve headers in RequestQueue

I'm trying to crawl our local Confluence installation with the PuppeteerCrawler. My strategy is to login first, then extracting the session cookies and using them in the header of the start url. The code is as follows: First, I login 'by foot' to…
Thurse
  • 253
  • 1
  • 3
  • 16
2
votes
2 answers

Sending HTTP Post Request to site with array in Body

I'm trying to make a POST request and send some values in the body of an API call. In the documentation of the API it says I need to make a POST request, using startUrls as an array with key and value.
Jack Ellis
  • 29
  • 1
  • 1
  • 6
1
vote
1 answer

Can not import ApifyWrapper from langchain.utilities to make web scrapping

I can not import ApifyWrapper from langchain.utilities. I need to import it to make web scrapping from a web site for a school project. I got this error: ** ImportError Traceback (most recent call last) Cell In1, line…
Darlyn LC
  • 11
  • 3
1
vote
0 answers

How to extract certain information from return type using apify API and Node.js

I am trying to get gas prices for a web app I am making. I found a web scraper from Apify that gets the data I need. The problem is I'm not sure how to extract just the gas prices from the returned object from the API. There is a lot of…
Peter
  • 21
  • 3
1
2 3
13 14