I used Metaflow to load a Dataframe. It was successfully unpickled from the artifact store, but when I try to view its index using df.index, I get an error that says ModuleNotFoundError: No module named 'pandas.core.indexes.numeric'. Why?
I've…
I have a questions regarding differences between Apache Airflow and Metaflow(https://docs.metaflow.org/). As far as I understood Apache airflow is just a job scheduler, that runs tasks. Metaflow from Netflix is as a dataflow library, which creates…
I would like to implement integration tests featuring Metaflow flows; i.e. running a flow from start to finish within a Docker container; and ideally this wouldn't require substantial rewriting of the flows which contain @batch decorators on…
It's possible to control/specify the Python environment via specific environment.yml and then using conda to create/activate it. However, for some projects, I might want to have a finer-grained control of environments in which Python code is…
Running on AWS I would usually define a step:
@batch(cpu=1, memory=5000)
@conda(libraries={'pandas': '1'})
@step
def hello(self):
do stuff...
However, I am working with deeplearning libraries (MXnet/tensorflow/pytorch), and they are not…
I recently started using Metaflow for my hyperparameter searches. I'm using a foreach for all my parameters as follows:
from metaflow import FlowSpec, step
@step
def start_hpo(self):
self.next(self.train_model,…
The official tutorials of metaflow show that analysis can be done using jupyter notebook and metadata after running a script. Also I know metaflow automatically writes metadata to s3. Then how can I get metadata from s3 using jupyter notebook? The…
I am trying to run multiprocessing package in metaflow, in which fasttext model is running to predict some results. Here is my code:
import pickle
import os
import boto3
import multiprocessing
from functools import partial
from multiprocessing…
I used the CloudFormation template provided by Metaflow to deploy it on AWS, and I ran metaflow configure aws to create a configuration file with the deployment outputs, as outlined in the documentation. The deployment was successful and the…
How do I save a plot as a data artifact in MetaFlow? Plotting libraries usually have you write out to a file on disk. How do I view the figure afterwards?
MetaFlow permits you to set the maximum number of concurrent tasks using the --max-workers CLI flag (ref: https://docs.metaflow.org/metaflow/scaling#safeguard-flags). However, I would like to avoid setting this every time.
Is it possible to set the…
When running MetaFlow Flows, tqdm progress bars do not get displayed until the final iteration, which defeats the purpose of measuring progress. Is there a way to force MetaFlow to print out tqdm updates?
I have the following folder structure:
metaflow project/
flow_a.py
flow_b.py
helpers.py
Flow a and flow b are separated independent flows, but there some functions that occurs both in a and b,
For avoiding duplicate code I made helper function…
What is the correct way to write unit tests for individual MetaFlow steps? And, how do you test full DAGs using fixtures in place of real datasets? How can I ensure that these tests' artifacts don't pollute the artifact store?