Questions tagged [azure-databricks]

For questions about the usage of Databricks Lakehouse Platform on Microsoft Azure

Overview

Azure Databricks is the Azure-based implementation of Databricks, which is a high-level platform for working with Apache Spark and includes Jupyter-style notebooks.

Azure Databricks is a first class Azure service and natively integrates with other Azure services such as Active Directory, Blob Storage, Cosmos DB, Data Lake Store, Event Hubs, HDInsight, Key Vault, Synapse Analytics, etc.

Related Tags

4095 questions
44
votes
5 answers

How to get the schema definition from a dataframe in PySpark?

In PySpark it you can define a schema and read data sources with this pre-defined schema, e. g.: Schema = StructType([ StructField("temperature", DoubleType(), True), StructField("temperature_unit", StringType(), True), …
Hauke Mallow
  • 2,887
  • 3
  • 11
  • 29
28
votes
3 answers

How to list all the mount points in Azure Databricks?

I tried with this %fs ls dbfs:/mnt, but i want to know do this give me all the mount point?
Shahid Ahmed
  • 281
  • 1
  • 3
  • 3
27
votes
6 answers

How to delete all files from folder with Databricks dbutils

Can someone let me know how to use the databricks dbutils to delete all files from a folder. I have tried the following but unfortunately, Databricks doesn't support…
Carltonp
  • 1,166
  • 5
  • 19
  • 39
25
votes
3 answers

Parquet vs Delta format in Azure Data Lake Gen 2 store

I am importing fact and dimension tables from SQL Server to Azure Data Lake Gen 2. Should I save the data as "Parquet" or "Delta" if I am going to wrangle the tables to create a dataset useful for running ML models on Azure Databricks ? What is the…
25
votes
5 answers

Databricks: How do I get path of current notebook?

Databricks is smart and all, but how do you identify the path of your current notebook? The guide on the website does not help. It suggests: %scala dbutils.notebook.getContext.notebookPath res1: Option[String] =…
Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56
24
votes
7 answers

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table: SomeData_df.write.mode('overwrite').saveAsTable("SomeData") I get the following error: "Can not create the managed table('SomeData').…
18
votes
1 answer

Local instance of Databricks for development

I am currently working on a small team that is developing a Databricks based solution. For now we are small enough to work off of cloud instances of Databricks. As the group grows this will not really be practical. Is there a "local" install of…
John
  • 3,458
  • 4
  • 33
  • 54
16
votes
2 answers

Stop Execution of Databricks notebook after specific cell

I Tried sys.exit(0)(Python code) and dbutils.notebook.exit() on Databricks notebook. But both the option didn't work. Please suggest any other way to stop the execution of code after a specific cell in Databricks notebook.
sizo_abe
  • 411
  • 1
  • 3
  • 13
15
votes
3 answers

Checking the version of Databricks Runtime in Azure

Is it possible to check the version of Databricks Runtime in Azure?
Krzysztof Słowiński
  • 6,239
  • 8
  • 44
  • 62
14
votes
3 answers

df to table throw error TypeError: __init__() got multiple values for argument 'schema'

I have dataframe in pandas :- purchase_df. I want to convert it to sql table so I can perform sql query in pandas. I tried this method purchase_df.to_sql('purchase_df', con=engine, if_exists='replace', index=False) It throw an error TypeError:…
Arpan Ghimire
  • 183
  • 1
  • 1
  • 8
14
votes
1 answer

Printing secret value in Databricks

Even though secrets are for masking confidential information, I need to see the value of the secret for using it outside Databricks. When I simply print the secret it shows [REDACTED]. print(dbutils.secrets.get(scope="myScope",…
14
votes
4 answers

list the files of a directory and subdirectory recursively in Databricks(DBFS)

Using python/dbutils, how to display the files of the current directory & subdirectory recursively in Databricks file system(DBFS).
Kiran A
  • 179
  • 1
  • 2
  • 7
14
votes
2 answers

Writing log with python logging module in databricks to azure datalake not working

I'm trying to write my own log files to Azure Datalake Gen 2 in a Python-Notebook within Databricks. I'm trying to achieve that by using the Python logging module. Unfortunately I can't get it working. No errors are raised, the folders are created…
Dominik Braun
  • 191
  • 1
  • 1
  • 5
13
votes
0 answers

PySpark and Protobuf Deserialization UDF Problem

I'm getting this error Can't pickle : it's not found as google.protobuf.pyext._message.CMessage when I try to create a UDF in PySpark. Apparently, it uses CloudPickle to serialize the command…
Marc Vitalis
  • 2,129
  • 4
  • 24
  • 36
13
votes
2 answers

How to properly access dbutils in Scala when using Databricks Connect

I'm using Databricks Connect to run code in my Azure Databricks cluster locally from IntelliJ IDEA (Scala). Everything works fine. I can connect, debug, inspect locally in the IDE. I created a Databricks Job to run my custom app JAR, but it fails…
empz
  • 11,509
  • 16
  • 65
  • 106
1
2 3
99 100