4

I am trying to create and configure the Azure Databricks SCIM Provisioning Connector, so I can provision users in my Databricks workspace from AAD.

Following these instructions, I can get it to work manually. That is, creating and setting up the application in Azure Portal works and my selected users synchronise in Databricks. (The process wasn't completely straightforward. A lot of fiddling, which I don't remember, with the provisioning setup was needed before it did anything.)

When I try to transpose this into Terraform, I'm not getting very far:

  • I can create the application with Terraform, using the same Service Principal that created the Databricks Workspace resource:

    data "azuread_application_template" "scim" {
      display_name = "Azure Databricks SCIM Provisioning Connector"
    }
    
    resource "azuread_application" "scim" {
      display_name = "${var.name}-scim"
      template_id  = data.azuread_application_template.scim.template_id
    
      feature_tags {
        enterprise = true
        gallery    = true
      }
    }
    

    Similarly, I can create the Databricks access token for my Service Principal very easily:

    resource "databricks_token" "scim" {
      comment = "SCIM Integration"
    }
    
  • Now I'm stuck:

    1. How do I define the users and groups for the enterprise application in Terraform? I don't see any azuread resource that looks appropriate.
    2. Likewise, how do I configure the provisioning for the enterprise application in Terraform (i.e., with the SCIM endpoint URL and Databricks token, etc.)?

(Aside: I note that, in my Terraform-created application, if I proceed to manually set up the users and provisioning in Azure Portal, it doesn't seem to do anything. I may be being impatient: the "Provision on Demand" button does actually work, but the polled synchronisation is either not doing anything or being really slow.)

(Edit: An update on the aside: The polled provisioning -- set up manually on a Terraform-managed SCIM app -- has now run twice since I wrote this question. In which time, it has not synchronised the users I manually selected, but instead has decided to delete the "Provision on Demand" user in Databricks that I created earlier...)

Xophmeister
  • 8,884
  • 4
  • 44
  • 87
  • what is your end goal - provision users? If yes, how fast it should be? – Alex Ott Jan 31 '22 at 17:17
  • My understanding is that the Databricks SCIM application will provision users/groups and run periodically to synchronise Databricks with what you've chosen from AAD. (Please correct me if I'm wrong.) Alternatively, it's straightforward to, e.g., read an AAD group in Terraform and then provision those members as Databricks users. However, without extra machinery that I'd have to build, that would be a single run and thus lose any automatic synchronisation from future AAD group membership changes. – Xophmeister Jan 31 '22 at 17:46
  • You can simply trigger execution of the same terraform (for example on Azure DevOps, somewhere else) & it will provision new users, remove deleted. That's what we're doing. You just need to have state persisted somewhere, for example, on ADLS – Alex Ott Jan 31 '22 at 17:57
  • Does this answer your question? [How to configure SCIM provisioning for Azure AD and Databricks via terraform?](https://stackoverflow.com/questions/73125274/how-to-configure-scim-provisioning-for-azure-ad-and-databricks-via-terraform) – Chris Snow Sep 16 '22 at 12:41

1 Answers1

0

I'm trying to solve this puzzle myself.

On 1: From my understanding, you can assign users and groups via role assignments through MS Graph. See first tf example here App role assignment for accessing Microsoft Graph,

And apply the described configs from Automate SCIM provisioning using Microsoft Graph, such as granting these permissions:

Application.ReadWrite.All
Application.ReadWrite.OwnedBy

On 2: It doesn't seem to be possible to feed the Workspace SCIM endpoint and Token in a programmatic way into the created Azure application "Azure Databricks SCIM Provisioning Connector", as these seem to be gallery app specific config parameters. So manual intervention needed for that option I'm afraid.

According to Databricks, a full provisioning automation of AAD SCIM is not possible. But the Terraform SCIM approach would be fully automatable. Example see:

// define which groups have access to a particular workspace
variable "groups" {
  default = {
    "AAD Group A" = {
      workspace_access      = true
      databricks_sql_access = false
    },
    "AAD Group B" = {
      workspace_access      = false
      databricks_sql_access = true
    }
  }
}

// read group members of given groups from AzureAD every time Terraform is started
data "azuread_group" "this" {
  for_each     = toset(keys(var.groups))
  display_name = each.value
}

// create or remove groups within databricks - all governed by "groups" variable
resource "databricks_group" "this" {
  for_each              = data.azuread_group.this
  display_name          = each.key
  external_id           = each.value.id
  workspace_access      = var.groups[each.key].workspace_access
  databricks_sql_access = var.groups[each.key].databricks_sql_access
}

// read users from AzureAD every time Terraform is started
data "azuread_user" "this" {
  for_each  = toset(flatten([for g in data.azuread_group.this : g.members]))
  object_id = each.value
}

// all governed by AzureAD, create or remove users from databricks workspace
resource "databricks_user" "this" {
  for_each     = data.azuread_user.this
  external_id  = each.value.id
  user_name    = each.value.user_principal_name
  display_name = each.value.display_name
  active       = each.value.account_enabled
}

// put users to respective groups
resource "databricks_group_member" "this" {
  for_each = toset(flatten(
    [
      for group_name in keys(var.groups) :
      [
        for member_id in data.azuread_group.this[group_name].members :
        jsonencode({
          user : member_id,
          group : group_name
        })
      ]
  ]))
  group_id  = databricks_group.this[jsondecode(each.value).group].id
  member_id = databricks_user.this[jsondecode(each.value).user].id
}
Crypto
  • 65
  • 3
  • 8
  • 1
    Did you get this to work, @crypto? I'm interested in this myself – Wout May 16 '22 at 14:18
  • 2
    this answer contains a link to a script that doing sync of users/groups/service principals: https://stackoverflow.com/questions/73125274/how-to-configure-scim-provisioning-for-azure-ad-and-databricks-via-terraform/73266244#73266244 – Alex Ott Sep 16 '22 at 12:30