Questions tagged [databricks]

Databricks is a unified platform with tools for building, deploying, sharing, and maintaining enterprise-grade data and AI solutions at scale. The Databricks Lakehouse Platform integrates with cloud storage and security in your cloud account, and manages and deploys cloud infrastructure on your behalf. Databricks is available on AWS, Azure, and GCP. Use this tag for questions related to the Databricks Lakehouse Platform.

Use this tag for questions specific to Databricks Lakehouse Platform, including, but not limited to Databricks file system, REST APIs, Databricks Spark SQL extensions and orchestrating tools.

Don't use this tag for generic questions about apache-spark or public Spark packages maintained by Databricks (like spark-csv).

Related tags:

7135 questions

votes

3 answers

Exploding nested Struct in Spark dataframe

I'm working through a Databricks example. The schema for the dataframe looks like: > parquetDF.printSchema root |-- department: struct (nullable = true) | |-- id: string (nullable = true) | |-- name: string (nullable = true) |-- employees:…

scala apache-spark apache-spark-sql distributed-computing databricks

asked Sep 01 '16 at 15:41

Feynman27

3,049
6
30
39

votes

3 answers

How to list all the mount points in Azure Databricks?

I tried with this %fs ls dbfs:/mnt, but i want to know do this give me all the mount point?

scala databricks azure-databricks

asked Jun 05 '20 at 12:57

Shahid Ahmed

votes

6 answers

How to delete all files from folder with Databricks dbutils

Can someone let me know how to use the databricks dbutils to delete all files from a folder. I have tried the following but unfortunately, Databricks doesn't support…

databricks azure-databricks dbutils

asked Jan 07 '19 at 20:48

Carltonp

1,166
5
19
39

votes

5 answers

Databricks: How do I get path of current notebook?

Databricks is smart and all, but how do you identify the path of your current notebook? The guide on the website does not help. It suggests: %scala dbutils.notebook.getContext.notebookPath res1: Option[String] =…

path jupyter-notebook databricks azure-databricks

asked Nov 28 '18 at 16:03

Esben Eickhardt

3,183
2
35
56

votes

6 answers

Databricks: Download a dbfs:/FileStore File to my Local Machine?

I am using saveAsTextFile() to store the results of a Spark job in the folder dbfs:/FileStore/my_result. I can access to the different "part-xxxxx" files using the web browser, but I would like to automate the process of downloading all files to my…

curl filesystems databricks

asked Feb 27 '18 at 23:36

Nacho Castiñeiras

votes

7 answers

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table: SomeData_df.write.mode('overwrite').saveAsTable("SomeData") I get the following error: "Can not create the managed table('SomeData').…

apache-spark hive azure-data-lake databricks azure-databricks

asked Mar 27 '19 at 15:04

BuahahaXD

votes

4 answers

How to detect Databricks environment programmatically

I'm writing a spark job that needs to be runnable locally as well as on Databricks. The code has to be slightly different in each environment (file paths) so I'm trying to find a way to detect if the job is running in Databricks. The best way I have…

java apache-spark databricks

asked Jul 13 '18 at 16:13

steven35

3,747
3
34
48

votes

8 answers

Databricks drop a delta table?

How can I drop a Delta Table in Databricks? I can't find any information in the docs... maybe the only solution is to delete the files inside the folder 'delta' with the magic command or dbutils: %fs rm -r delta/mytable? EDIT: For clarification, I…

databricks delta-lake

asked Nov 22 '19 at 09:01

Joanteixi

votes

4 answers

How to handle an AnalysisException on Spark SQL?

I am trying to execute a list of queries in Spark, but if the query does not run correctly, Spark throws me the following error: AnalysisException: "ALTER TABLE CHANGE COLUMN is not supported for changing ... This is part of my code (i'm using…

python apache-spark pyspark apache-spark-sql databricks

asked Oct 04 '19 at 17:39

Kevin Gomez

votes

2 answers

Apache Spark + Delta Lake concepts

I have many doubts related to Spark + Delta. 1) Databricks propose 3 layers (bronze, silver, gold), but in which layer is recommendable to use for Machine Learning and why? I suppose they propose to have the data clean and ready in the gold…

apache-spark apache-kafka data-warehouse databricks delta-lake

asked May 19 '19 at 19:20

Eric Gabriel Bellet Locker

votes

5 answers

How to load databricks package dbutils in pyspark

I was trying to run the below code in pyspark. dbutils.widgets.text('config', '', 'config') It was throwing me an error saying Traceback (most recent call last): File "", line 1, in NameError: name 'dbutils' is not defined so,…

pyspark databricks

asked Aug 16 '18 at 21:04

Babu

votes

3 answers

NameError: name 'dbutils' is not defined in pyspark

I am running a pyspark job in databricks cloud. I need to write some of the csv files to databricks filesystem (dbfs) as part of this job and also i need to use some of the dbutils native commands like, #mount azure blob to dbfs…

apache-spark-sql azure-blob-storage databricks

asked Jun 12 '18 at 09:16

Krishna Reddy

1,069
5
12
18

votes

7 answers

How to drop a column from a Databricks Delta table?

I have recently started discovering Databricks and faced a situation where I need to drop a certain column of a delta table. When I worked with PostgreSQL it was as easy as ALTER TABLE main.metrics_table DROP COLUMN metric_1; I was looking…

sql apache-spark apache-spark-sql databricks delta-lake

asked Jan 31 '19 at 09:15

samba

2,821
6
30
85

votes

1 answer

Databricks SQL - How to get all the rows (more than 1000) in the first run?

Currently, in Databricks if we run the query, it always returns 1000 rows in the first run. If we need all the rows, we need to execute the query again. In the situations where we know that we need to download full data(1000+ rows), is there a turn…

sql apache-spark-sql databricks

asked Oct 01 '20 at 19:03

MrKrizzer

votes

4 answers

How to move files of same extension in databricks files system?

I am facing file not found exception when i am trying to move the file with * in DBFS. Here both source and destination directories are in DBFS. I have the source file named "test_sample.csv" available in dbfs directory and i am using the command…

databricks

asked Jun 08 '18 at 13:18

Krishna Reddy

1,069
5
12
18

2 3

…

99 100 Next