2

The official tutorials of metaflow show that analysis can be done using jupyter notebook and metadata after running a script. Also I know metaflow automatically writes metadata to s3. Then how can I get metadata from s3 using jupyter notebook? The only way I can think of is to use boto3. I guess there might be better tools available.

zqin
  • 95
  • 10

2 Answers2

1

By default, Metaflow stores metadata in your local file system. In order to leverage S3, you have to configure Metaflow to actually use AWS resources.

Here's a high-level overview for Metaflow, just so you are familiar with it.

Learn Metaflow in 10 mins - A hands-on tutorial

Here are specific guidelines for connecting it to AWS.

Metaflow on AWS

Bv Kay
  • 111
  • 7
0

You should be able to use the python client provided by metaflow to access the data.

example:

from metaflow import Step
print(Step('DebugFlow/2/a').task.data.x)

Where DebugFlow is the flow name, 2 is the run number, a is the step name, and x is the variable name of the artifact/metadata you are trying to load.

This is documented here: https://docs.metaflow.org/metaflow/client#accessing-data

ferras
  • 21
  • 2
  • Thank you, but I am afraid that this is not the answer to my question. I want to get access to metadata on S3, not on local machine. – zqin Jan 16 '20 at 03:13
  • that would pull the metadata from s3 assuming you have the datastore configured to be s3. This can be set using the cli option `--datastore s3`. see this section of the documentation: https://docs.metaflow.org/metaflow-on-aws/metaflow-on-aws#amazon-web-services – ferras Jan 16 '20 at 17:34
  • I have already used "metaflow configure aws" to generate a config.json file. After reviewing the documentation, I guess I could not do it because I did not deploy a metaflow service. Is this necessary for using get_metadata() locally to diagnose remote flows? If so, could you please tell me where to find a comprehensive tutorial? – zqin Jan 18 '20 at 05:03
  • i think you maybe conflating two topics which is the storage of metadata and the storage of artifacts. But if you want to use metaflow with s3 and run flows remotely you can go through this tutorial: https://docs.metaflow.org/getting-started/tutorials/season-2-scaling-out-and-up it will give you a better idea of the different configurations and offerings. Can you explain what you are trying todo at a high level and then maybe i can suggest a configuration? – ferras Jan 20 '20 at 18:53
  • I did not configure the metadata service on aws. That might be the reason why I can only load locally stored artifacts. I will look into the details. Thank you for your help! – zqin Feb 03 '20 at 20:23