Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about or public Spark packages maintained by Databricks (like ).

Related tags:

7135 questions
34
votes
3 answers

Exploding nested Struct in Spark dataframe

I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (nullable = true) | |-- id: string (nullable = true) | |-- name: string (nullable = true) |-- employees:…
28
votes
3 answers

How to list all the mount points in Azure Databricks?

I tried with this %fs ls dbfs:/mnt, but i want to know do this give me all the mount point?
Shahid Ahmed
  • 281
  • 1
  • 3
  • 3
27
votes
6 answers

How to delete all files from folder with Databricks dbutils

Can someone let me know how to use the databricks dbutils to delete all files from a folder. I have tried the following but unfortunately, Databricks doesn't support…
Carltonp
  • 1,166
  • 5
  • 19
  • 39
25
votes
5 answers

Databricks: How do I get path of current notebook?

Databricks is smart and all, but how do you identify the path of your current notebook? The guide on the website does not help. It suggests: %scala dbutils.notebook.getContext.notebookPath res1: Option[String] =…
Esben Eickhardt
  • 3,183
  • 2
  • 35
  • 56
25
votes
6 answers

Databricks: Download a dbfs:/FileStore File to my Local Machine?

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my…
Nacho Castiñeiras
  • 303
  • 1
  • 4
  • 7
24
votes
7 answers

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table: SomeData_df.write.mode('overwrite').saveAsTable("SomeData") I get the following error: "Can not create the managed table('SomeData').…
24
votes
4 answers

How to detect Databricks environment programmatically

I'm writing a spark job that needs to be runnable locally as well as on Databricks. The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the job is running in Databricks. The best way I have…
steven35
  • 3,747
  • 3
  • 34
  • 48
23
votes
8 answers

Databricks drop a delta table?

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils: %fs rm -r delta/mytable? EDIT: For clarification, I…
Joanteixi
  • 427
  • 1
  • 4
  • 10
23
votes
4 answers

How to handle an AnalysisException on Spark SQL?

I am trying to execute a list of queries in Spark, but if the query does not run correctly, Spark throws me the following error: AnalysisException: "ALTER TABLE CHANGE COLUMN is not supported for changing ... This is part of my code (i'm using…
Kevin Gomez
  • 273
  • 1
  • 2
  • 7
23
votes
2 answers

Apache Spark + Delta Lake concepts

I have many doubts related to Spark + Delta. 1) Databricks propose 3 layers (bronze, silver, gold), but in which layer is recommendable to use for Machine Learning and why? I suppose they propose to have the data clean and ready in the gold…
23
votes
5 answers

How to load databricks package dbutils in pyspark

I was trying to run the below code in pyspark. dbutils.widgets.text('config', '', 'config') It was throwing me an error saying Traceback (most recent call last): File "", line 1, in NameError: name 'dbutils' is not defined so,…
Babu
  • 861
  • 3
  • 13
  • 36
22
votes
3 answers

NameError: name 'dbutils' is not defined in pyspark

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like, #mount azure blob to dbfs…
Krishna Reddy
  • 1,069
  • 5
  • 12
  • 18
20
votes
7 answers

How to drop a column from a Databricks Delta table?

I have recently started discovering Databricks and faced a situation where I need to drop a certain column of a delta table. When I worked with PostgreSQL it was as easy as ALTER TABLE main.metrics_table DROP COLUMN metric_1; I was looking…
samba
  • 2,821
  • 6
  • 30
  • 85
19
votes
1 answer

Databricks SQL - How to get all the rows (more than 1000) in the first run?

Currently, in Databricks if we run the query, it always returns 1000 rows in the first run. If we need all the rows, we need to execute the query again. In the situations where we know that we need to download full data(1000+ rows), is there a turn…
MrKrizzer
  • 409
  • 2
  • 5
  • 17
19
votes
4 answers

How to move files of same extension in databricks files system?

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command…
Krishna Reddy
  • 1,069
  • 5
  • 12
  • 18
1
2 3
99 100