Questions tagged [netflix-metaflow]

MetaFlow

Build and manage real-life data science projects with ease.

29 questions
16
votes
4 answers

ModuleNotFoundError: No module named 'pandas.core.indexes.numeric' using Metaflow

I used Metaflow to load a Dataframe. It was successfully unpickled from the artifact store, but when I try to view its index using df.index, I get an error that says ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'. Why? I've…
crypdick
  • 16,152
  • 7
  • 51
  • 74
5
votes
1 answer

How to create nested branches in metaflow?

I am using metaflow to create a text processing pipeline as follows:- ___F------ ______ D---| | | |___G---| |__> ____B-----| …
Vineet
  • 723
  • 4
  • 12
  • 31
3
votes
1 answer

Metaflow from Netflix vs Apache Airflow

I have a questions regarding differences between Apache Airflow and Metaflow(https://docs.metaflow.org/). As far as I understood Apache airflow is just a job scheduler, that runs tasks. Metaflow from Netflix is as a dataflow library, which creates…
Daniel Yefimov
  • 860
  • 1
  • 10
  • 24
3
votes
0 answers

Is there a recommended way to mock AWS Batch for integration tests in Metaflow?

I would like to implement integration tests featuring Metaflow flows; i.e. running a flow from start to finish within a Docker container; and ideally this wouldn't require substantial rewriting of the flows which contain @batch decorators on…
kd88
  • 1,054
  • 10
  • 21
2
votes
0 answers

fine-grained environment control in python

It's possible to control/specify the Python environment via specific environment.yml and then using conda to create/activate it. However, for some projects, I might want to have a finer-grained control of environments in which Python code is…
SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46
2
votes
1 answer

Telling metaflow to install package with pip using conda decorator

Running on AWS I would usually define a step: @batch(cpu=1, memory=5000) @conda(libraries={'pandas': '1'}) @step def hello(self): do stuff... However, I am working with deeplearning libraries (MXnet/tensorflow/pytorch), and they are not…
Gerges
  • 6,269
  • 2
  • 22
  • 44
2
votes
2 answers

Stop Metaflow from parallelising foreach steps

I recently started using Metaflow for my hyperparameter searches. I'm using a foreach for all my parameters as follows: from metaflow import FlowSpec, step @step def start_hpo(self): self.next(self.train_model,…
BBQuercus
  • 819
  • 1
  • 11
  • 28
2
votes
2 answers

How to use metaflow to get metadata from s3?

The official tutorials of metaflow show that analysis can be done using jupyter notebook and metadata after running a script. Also I know metaflow automatically writes metadata to s3. Then how can I get metadata from s3 using jupyter notebook? The…
zqin
  • 95
  • 10
1
vote
1 answer

How to use python package multiprocessing in metaflow?

I am trying to run multiprocessing package in metaflow, in which fasttext model is running to predict some results. Here is my code: import pickle import os import boto3 import multiprocessing from functools import partial from multiprocessing…
Feng Chen
  • 2,139
  • 4
  • 33
  • 62
1
vote
0 answers

Deploying Metaflow on AWS results in a "403 Forbidden" error

I used the CloudFormation template provided by Metaflow to deploy it on AWS, and I ran metaflow configure aws to create a configuration file with the deployment outputs, as outlined in the documentation. The deployment was successful and the…
1
vote
1 answer

How to store and retrieve figures as artifacts in MetaFlow?

How do I save a plot as a data artifact in MetaFlow? Plotting libraries usually have you write out to a file on disk. How do I view the figure afterwards?
crypdick
  • 16,152
  • 7
  • 51
  • 74
1
vote
1 answer

How to set MetaFlow's --max-workers flag from within Python definition?

MetaFlow permits you to set the maximum number of concurrent tasks using the --max-workers CLI flag (ref: https://docs.metaflow.org/metaflow/scaling#safeguard-flags). However, I would like to avoid setting this every time. Is it possible to set the…
1
vote
1 answer

How to show tqdm progress in MetaFlow?

When running MetaFlow Flows, tqdm progress bars do not get displayed until the final iteration, which defeats the purpose of measuring progress. Is there a way to force MetaFlow to print out tqdm updates?
crypdick
  • 16,152
  • 7
  • 51
  • 74
1
vote
0 answers

how to import functions from different file when deploying flows to aws step function?

I have the following folder structure: metaflow project/ flow_a.py flow_b.py helpers.py Flow a and flow b are separated independent flows, but there some functions that occurs both in a and b, For avoiding duplicate code I made helper function…
helpper
  • 2,058
  • 4
  • 13
  • 32
1
vote
0 answers

How to write pytests for MetaFlow DAGs?

What is the correct way to write unit tests for individual MetaFlow steps? And, how do you test full DAGs using fixtures in place of real datasets? How can I ensure that these tests' artifacts don't pollute the artifact store?
crypdick
  • 16,152
  • 7
  • 51
  • 74
1
2