1

In cosmosDB, I need to insert a large amount of data in a new container.

create_table_sql = f"""
        CREATE TABLE IF NOT EXISTS cosmosCatalog.`{cosmosDatabaseName}`.{cosmosContainerName} 
        USING cosmos.oltp
        OPTIONS(spark.cosmos.database = '{cosmosDatabaseName}')
        TBLPROPERTIES(partitionKeyPath = '/id', manualThroughput = '10000', indexingPolicy = 'AllProperties', defaultTtlInSeconds = '-1');
        """
spark.sql(create_table_sql)

# Read data with spark
data = (
    spark.read.format("csv")
    .options(header="True", inferSchema="True", delimiter=";")
    .load(spark_file_path)
    )

cfg = {
        "spark.cosmos.accountEndpoint": "https://XXXXXXXXXX.documents.azure.com:443/",
        "spark.cosmos.accountKey": "XXXXXXXXXXXXXXXXXXXXXX",
        "spark.cosmos.database": cosmosDatabaseName,
        "spark.cosmos.container": cosmosContainerName,
    }

data.write.format("cosmos.oltp").options(**cfg).mode("APPEND").save()

Then after this insert I would like to change the Manual Throughput of this container.

alter_table = f"""
        ALTER TABLE cosmosCatalog.`{cosmosDatabaseName}`.{cosmosContainerName} 
        SET TBLPROPERTIES( manualThroughput = '400');
        """
spark.sql(alter_table)

Py4JJavaError: An error occurred while calling o342.sql. : java.lang.UnsupportedOperationException

I find no documentation online on how to change TBLPROPERTIES for a cosmosdb table in sparkSQL. I know I can edit it on the Azure Portal and also with azure cli, but I would like to keep it in sparkSQL.

BeGreen
  • 765
  • 1
  • 13
  • 39
  • For people reading this in the future. It should be possible after this https://github.com/Azure/azure-sdk-for-java/pull/33369 is built and published in new cosmosBD JARs – BeGreen Feb 10 '23 at 11:22

1 Answers1

1

This is not supported with the spark connector for NOSQL API , you might need to track the issue here. So you might need to do it through CLI command or portal or SDK (Java)

FYI : Cosmos NOSQL API container is not similar to Table in SQL, so alter commands will not work.

Sajeetharan
  • 216,225
  • 63
  • 350
  • 396
  • Guess I'll use the python SDK to do this. I really don't want to install azure cli on my docker images. Here is an example on how to do it. https://github.com/Azure/azure-sdk-for-python/blob/main/sdk/cosmos/azure-cosmos/samples/container_management.py#L215 215 to 227 – BeGreen Feb 07 '23 at 14:24
  • That's right , you would need to use that – Sajeetharan Feb 07 '23 at 14:39
  • I've updated the issue you linked, and a PR was made and merged into master https://github.com/Azure/azure-sdk-for-java/pull/33369 Now, we just need to wait for a new build of the JAR – BeGreen Feb 10 '23 at 11:20