How to convert TabularDataset or a pandas dataframe to a FileDataset using azure ml python sdk v2?

Question

I've written a webscraping script to extract a web table in a dataframe format which I have converted into a TabularDataset using Dataset.Tabular.register_pandas_dataframe() to store in the default datastore.

I want to pass this webscraped table as a side_input to ParallelRunStep() in a batch inferencing pipeline but in order to achieve that, the side_input should be in FileDataset type.

The established way that has worked so far to convert TabularDataset into FileDataset was using

side_input = Dataset.File.from_files(path="/path/to/file/on/datastore")

The above method worked for uploaded csv files in the datastore. But thing with registering a pandas dataframe in the datastore is that with each run of the webscraping script occurs a registration of pandas dataframe in the datastore to convert it into a TabularDataset and the relative path in the datastore changes.

Hardcoding the relative path works but since the web table data changes periodically, I want the latest data from the web table to be used aas a side_input

My questions:

how to convert pandas dataframe into FileDataset eg: to_FileDataset if it even exists?
how to convert TabularDatset into FileDataset?
if there is any way to find the relative path of the registered tabular dataset in the datastore using azureml python sdk v2?

side note: the metadata of tabular dataset shows a json info something like this

>>> tabular_ds
{
  "source": [("default_datastore", "relative/path/to/the/tabular/dataset")],
   .
   .
   .
}

I was wondering if I can extract the source key from this like I can extract the ws.name after initializing ws = Workspace.from_config() . Just thinking out loud.

score 0 · Accepted Answer · answered Jul 26 '23 at 12:24

0

One possible solution to convert a TabularDataset into a FileDataset, is by using to_csv_files method which returns a FileDataset object from TabularDataset object.

enter image description here

This will support all the relevant methods of FileDataset Object.

answered Jul 26 '23 at 12:24

RishabhM

525
1
5

One follow up question, how to convert the FileDataset back to a pandas dataframe in batch inferencing environment like I discussed in my post? – MajorMajorMajorMajor Aug 07 '23 at 17:16

How to convert TabularDataset or a pandas dataframe to a FileDataset using azure ml python sdk v2?

1 Answers1