2

I have two datasets in foundry : df1 & df2, df1 has data with a schema.

the df2 is the empty dataframe with no schema applied.

Using data proxy i was able to extract the schema from df1

{
  "foundrySchema": {
    "fieldSchemaList": [
      {...

 }
    ],
    "primaryKey": null,
    "dataFrameReaderClass": "n/a",
    "customMetadata": {}
  },
  "rows": []
}

how can i apply this schema to the empty dataframe df2 via a rest call ?

The below foundry example shows how to commit an empty transaction, this example does not show how to apply the schema

curl -X POST \
  -H "Authorization: Bearer ${TOKEN}" \
  -H "Content-Type: application/json" \
  -d '{}' \
  "${CATALOG_URL}/api/catalog/datasets/${DATASET_RID}/transactions/${TRANSACTION_RID}/commit"
Asher
  • 175
  • 11
  • if you are talking about copying schemas between dataframes within code repositories, then this is more of a pyspark or spark question. But then you say you don't want to build it making the question ambiguous. Could you provide more detail on what you are trying to do? – fmsf May 04 '21 at 09:19
  • hi @fmsf, thank you for your response, i am materializing cdc datasets, i have snapshot datasets for which no transaction datasets are there, right now i am creating empty transaction dataset and then manually applying the schema from snapshot dataset to the transaction dataset for the dataset to materialize, there are about 90 snapshot tables that doesnt have corresponding transaction table, i was looking into automating this manual process, probably in a python rest call or if you have any suggestions , appreciate that – Asher May 04 '21 at 09:36
  • @fmsf, i was able to get the schema using `foundry-data-proxy` rest api for the snapshot dataset, how can i update the schema for the empty transaction dataset using rest call? – Asher May 04 '21 at 09:47
  • i have ammended the question to add more detail to it – Asher May 04 '21 at 10:41
  • thank you, for clearing it out meanwhile it seems you got what you wanted from @nicornk (ty) – fmsf May 05 '21 at 09:18

1 Answers1

2

Here is a Python function to upload a schema for a dataset with a committed transaction:

from urllib.parse import quote_plus
import requests


def upload_dataset_schema(dataset_rid: str,
                          transaction_rid: str, schema: dict, token: str, branch='master'):
    """
    Uploads the foundry dataset schema for a dataset, transaction, branch combination
    Args:
        dataset_rid: The rid of the dataset
        transaction_rid: The rid of the transaction
        schema: The foundry schema
        branch: The branch

    Returns: None

    """
    base_url = "https://foundry-instance/foundry-metadata/api"
    response = requests.post(f"{base_url}/schemas/datasets/"
                             f"{dataset_rid}/branches/{quote_plus(branch)}",
                             params={'endTransactionRid': transaction_rid},
                             json=schema,
                             headers={
                                 'content-type': "application/json",
                                 'authorization': f"Bearer {token}",
                             }
                             )
    response.raise_for_status()
nicornk
  • 654
  • 3
  • 11