Questions tagged [prefect]

Prefect is a Python-based workflow management system (ETLs are an example use-case). Users organize Tasks into Flows, define dependencies, schedules, etc., and Prefect takes care of the rest.

Prefect

Prefect is an open-source workflow and orchestration framework, written in Python 3, that bills itself as an up-and-coming alternative to Airflow. Its design philosophy emphasizes the benefits of negative engineering: that is, features designed to manage the failure and recoverability of workflows as a natural extension of normal development. Its creators also tout the benefits of its hybrid execution model, whereby orchestration occurs with zero knowledge of either the code being run or the data being manipulated. It also boats features such as first-class workflow scheduling, dynamic task generation, and horizontal workflow scalability via out-of-the-box integration with Dask Distributed.

Prefect consists of three components:

  • Prefect Core: the central features of development in the Prefect ecosystem, by which Tasks are composed into directed acyclic graphs (DAGs) called Flows.
  • Prefect Server: the GraphQL application and UI that, taken together, allow users to manage flow submission and execution in an easy-to-use and interactive manner.
  • Prefect Cloud: the optional commercial offering of the Prefect maintainers, which organizations can use to leverage managed infrastructure in addition to the benefits of Prefect Server.

More information about Prefect is available at:

179 questions
14
votes
3 answers

Best practice to run Prefect flow serverless in Google Cloud

I have started using Prefect for various projects and now I need to decide which deployment strategy on GCP would work best. Preferably I would like to work serverless. Comparing Cloud Run, Cloud Functions and App Engine, I am inclined to go for the…
dkapitan
  • 859
  • 2
  • 10
  • 21
13
votes
1 answer

Prefect: relationship between agent and executor?

According to Prefect's Hybrid Execution model, Agents "watch for any scheduled flow runs and execute them accordingly on your infrastructure," while Executors "are responsible for actually running tasks [...] users can submit functions and wait for…
Michael Wheeler
  • 849
  • 1
  • 10
  • 29
13
votes
1 answer

`docker run` as Prefect task

My actual workloads that should be run as tasks within a Prefect flow are all packaged as docker images. So a flow is basically just "run this container, then run that container". However, I'm unable to find any examples of how I can easily start a…
theDmi
  • 17,546
  • 6
  • 71
  • 138
11
votes
2 answers

Triggering a Prefect workflow externally

I currently have a Prefect workflow running locally on an EC2 instance. I can trigger my workflow on localhost:8080 through the UI. Is there a way to trigger a Prefect workflow externally (say AWS Lambda) via REST API or some other way? I know that…
LifeAndHope
  • 674
  • 2
  • 10
  • 27
10
votes
1 answer

Prefect ModuleNotFoundError when running from UI

I'm following the Prefect tutorial available at: https://docs.prefect.io/core/tutorial/01-etl-before-prefect.html. The code can be downloaded from the git: https://github.com/PrefectHQ/prefect/tree/master/examples/tutorial The tutorials have a…
Carl Rynegardh
  • 538
  • 1
  • 5
  • 22
8
votes
1 answer

Prefect how to avoid rerunning a task

In Prefect, suppose I have some pipeline which runs f(date) for every date in a list, and saves it to a file. This is a pretty common ETL operation. In airflow, if I run this once, it will backfill for all historical dates. If I run it again, it…
Nezo
  • 567
  • 4
  • 18
8
votes
1 answer

How does Prefect scale with thousands of workflows concurrently?

I have a prefect server running locally (0.13 core version). I called flow.run() in a loop 1000 thousand times in a server machine with 64 GB of RAM with 32 cores of CPU. When it got up to ~300 runs, it started throwing connection refused errors…
LifeAndHope
  • 674
  • 2
  • 10
  • 27
8
votes
1 answer

unable to register a prefect flow using varying parameters

I'm trying to implement a prefect flow using varying parameters: from prefect import Flow, Parameter from prefect.schedules import Schedule from prefect.schedules.clocks import CronClock a = Parameter('a', default=None, required=False) b =…
Panagiotis Simakis
  • 1,245
  • 1
  • 18
  • 45
7
votes
1 answer

How to execute a prefect Flow on a docker image?

My goal: I have a built docker image and want to run all my Flows on that image. Currently: I have the following task which is running on a Local Dask Executor. The server on which the agent is running is a different python environment from the one…
Newskooler
  • 3,973
  • 7
  • 46
  • 84
7
votes
1 answer

Is there a way to backfill historical data (once) for a new Flow in Prefect?

I just started reading about Prefect (and have a little experience using Airflow). My goal is to set a task which runs daily in Prefect and collects data to a folder (I guess that's what Prefect can help me do for sure). Also my goal is to populate…
Newskooler
  • 3,973
  • 7
  • 46
  • 84
7
votes
5 answers

Airflow Dagrun for each datum instead of scheduled

The current problem that I am facing is that I have documents in a MongoDB collection which each need to be processed and updated by tasks which need to run in an acyclic dependency graph. If a task upstream fails to process a document, then none of…
Sebastian Mendez
  • 2,859
  • 14
  • 25
6
votes
2 answers

Looping tasks in Prefect

I want to loop over tasks, again and again, until reaching a certain condition before continuing the rest of the workflow. What I have so far is this: # Loop task class MyLoop(Task): def run(self): loop_res =…
Gaëtan
  • 779
  • 1
  • 8
  • 26
6
votes
1 answer

Automatically register new prefect flows?

Is there a mechanism to automatically register flows/new flows if a local agent is running, without having to manually run e.g. flow.register(...) on each one? In airflow, I believe they have a process that regularly scans for any files with dag in…
evariste galois
  • 135
  • 1
  • 5
6
votes
1 answer

How to resume a Prefect flow on failure without having to re-run the entire flow?

TL;DR; I wasn't able to use prefect's FlowRunner to solve the above question. I likely either used it wrong (see below) or missed something. Would really appreciate any pointers! The Problem I read through the fantastic prefect core documentation…
rdmolony
  • 601
  • 1
  • 7
  • 15
5
votes
1 answer

Prefect how to wait for external dependency

I have a prefect flow that I want to run if and when a specific file appears. With something like Luigi you would create an ExternalTask that outputs that file and then impose a dependence on it. What is the standard pattern for this in Prefect?
Nezo
  • 567
  • 4
  • 18
1
2 3
11 12