0

I have looked at some posts and documentation on how to specify custom folder paths while creating an azure blob (using the azure data factories).

Official documentation:

https://learn.microsoft.com/en-us/azure/data-factory/v1/data-factory-azure-blob-connector#using-partitionedBy-property

Forums posts:

https://dba.stackexchange.com/questions/180487/datafactory-tutorial-blob-does-not-exist

I am successfully able to put into date indexed folders, however what I am not able to do is put into incremented/decremented date folders.

I tried using $$Text.Format (like below) but it gives a compile error --> Text.Format is not a valid blob path .

    "folderPath": "$$Text.Format('MyRoot/{0:yyyy/MM/dd}/', Date.AddDays(SliceEnd,-2))",

I tried using the PartitionedBy section (like below) but it too gives a compile error --> Only SliceStart and SliceEnd are valid options for "date"

    {
"name": "MyBlob",
"properties": {
    "published": false,
    "type": "AzureBlob",
    "linkedServiceName": "MyLinkedService",
    "typeProperties": {
        "fileName": "MyTsv.tsv",
        "folderPath": "MyRoot/{Year}/{Month}/{Day}/",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "\t",
            "nullValue": ""
        },
        "partitionedBy": [
            {
                "name": "Year",
                "value": {
                    "type": "DateTime",
                    "date": "Date.AddDays(SliceEnd,-2)",
                    "format": "yyyy"
                }
            },
            {
                "name": "Month",
                "value": {
                    "type": "DateTime",
                    "date": "Date.AddDays(SliceEnd,-2)",
                    "format": "MM"
                }
            },
            {
                "name": "Day",
                "value": {
                    "type": "DateTime",
                    "date": "Date.AddDays(SliceEnd,-2)",
                    "format": "dd"
                }
            }
        ]
    },
    "availability": {
        "frequency": "Day",
        "interval": 1
    },
    "external": false,
    "policy": {}
}

Any pointers are appreciated!

EDIT for response from Adam:

I also used folder structure directly in FileName as per suggestion from Adam as per below forum post:

Windows Azure: How to create sub directory in a blob container

I used it like in below sample.

     "typeProperties": {
        "fileName": "$$Text.Format('{0:yyyy/MM/dd}/MyBlob.tsv', Date.AddDays(SliceEnd,-2))",
        "folderPath": "MyRoot/",
        "format": {
            "type": "TextFormat",
            "rowDelimiter": "\n",
            "columnDelimiter": "\t",
            "nullValue": ""
        },        

It gives no compile error and also no error during deployment. But it throws an error during execution!!

Runtime Error is ---> Error in Activity: ScopeJobManager:PrepareScopeScript, Unsupported unstructured stream format '.adddays(sliceend,-2))', can't convert to unstructured stream.

I think the problem is that FileName can be used to create folders but not dynamic folder names, only static ones.

mike
  • 21
  • 4

1 Answers1

0

you should create a blob using the following convention: "foldername/myfile.txt" , so you could also append additional blobs under that foldername. I'd recommend checking this thread: Windows Azure: How to create sub directory in a blob container , It may help you resolve this case.

  • Hi Adam, Thanks for your response! I looked at that link you sent and it is simply asking to use folder structure directly in FileName - great idea! But i ran into error, i pasted error code and my snippet in the post above as comment section is too small.. – mike Mar 26 '18 at 19:14
  • an additional suggestion would be to assign a variable that would retain the date and time you are looking for, then transform the value to a string, then use the string as your blob name. it might be easier to just use string values and keep the slicing etc outside of the naming string – Adam Smith - Microsoft Azure Mar 26 '18 at 19:20
  • Can you tell me how to create and use variables - do you mean the 'defines'? "defines": { "Year" : "$$Text.Format('{0:yyyy}',WindowStart)", "Month" : "$$Text.Format('{0:MM}',WindowStart)", "Day" : "$$Text.Format('{0:dd}',WindowStart)", "Hour" : "$$Text.Format('{0:hh}',WindowStart)" } – mike Mar 26 '18 at 19:40
  • i tried using parameter as per https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions and it didnt help as i think parameter are different.... "parameters": { "folderPath1": { "type": "$$Text.Format('MyRoot/{0:yyyy/MM/dd}/', Date.AddDays(SliceEnd,-2))" } }, "typeProperties": { "fileName": "MyBlob.tsv", "folderPath": "@dataset().folderPath1", – mike Mar 26 '18 at 19:57
  • Nevermind, Json doesn't support my suggestion. the error you are getting has to do with the supported data formats in Azure Data Factory, I'd recommend checking:https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#json-format (I'm not super familiar with datafactory but seeing the blob storage tag made me give you a workaround about the blob name) – Adam Smith - Microsoft Azure Mar 26 '18 at 19:59