Highest Voted 'aws-databricks' Questions

18

votes

1 answer

Local instance of Databricks for development

I am currently working on a small team that is developing a Databricks based solution. For now we are small enough to work off of cloud instances of Databricks. As the group grows this will not really be practical. Is there a "local" install of…

databricks azure-databricks aws-databricks

asked Sep 11 '20 at 03:17

John

3,458
4
33
54

12

votes

5 answers

How to access shared Google Drive files through Python?

I try to access shared Google Drive files through Python. I have created an OAuth 2.0 ClientID as well as the OAuth consent. I have copy-pasted this code:…

python google-drive-api google-drive-shared-drive aws-databricks

asked May 20 '21 at 15:49

Szabolcs Magyar

369
1
2
11

7

votes

2 answers

Run databricks job from notebook

I want to know if it is possible to run a Databricks job from a notebook using code, and how to do it I have a job with multiple tasks, and many contributors, and we have a job created to execute it all, now we want to run the job from a notebook to…

python-3.x scala databricks jobs aws-databricks

asked Dec 20 '21 at 14:18

Joe

561
1
9
26

7

votes

3 answers

Execute multiple notebooks in parallel in pyspark databricks

Question is simple: master_dim.py calls dim_1.py and dim_2.py to execute in parallel. Is this possible in databricks pyspark? Below image is explaning what am trying to do, it errors for some reason, am i missing something here?

amazon-web-services databricks azure-databricks aws-databricks databricks-community-edition

asked Aug 26 '21 at 11:31

Chandra

371
3
10

6

votes

2 answers

How do we access databricks job parameters inside the attached notebook?

In Databricks if I have a job request json as: { "job_id": 1, "notebook_params": { "name": "john doe", "age": "35" } } How do I access the notebook_params inside the job attached notebook?

databricks aws-databricks databricks-cli dbutils

asked Sep 01 '21 at 06:50

Sannix19

75
1
6

5

votes

0 answers

AWS Key issue while working with Databricks Mount

Currently I am facing an issue while dealing with Databricks Mount point created on top of AWS S3 bucket. I could create the Mount Point in Databricks notebook with below code - ACCESS_KEY = "<>" SECRET_KEY =…

amazon-web-services amazon-s3 databricks aws-databricks

asked Jul 01 '20 at 11:54

Abhi

341
1
6
23

4

votes

0 answers

Delta Live CDC for Aggregate State Tables

As far as I can tell from the documentation, I can not accomplish a specific migration from Delta to Delta Live that I would love to do... but I want to see if I might be missing a solution. Currently, i have a number of aggregate batch Delta tables…

pyspark databricks delta-lake aws-databricks delta-live-tables

asked Sep 19 '22 at 15:03

Renée

455
2
7
15

4

votes

0 answers

How to clean up extremely large delta log checkpoints and many small files?

AWS by the way, if that matters. We have an old production table that has been running in the background for a couple of years, always with auto-optimize and auto-compaction turned off. Since then, it has written many small files (like 10,000 an…

databricks delta-lake aws-databricks

asked Aug 12 '22 at 16:50

Fenno Vermeij

126
3
4
11

4

votes

2 answers

Get a list of files in S3 using PySpark in Databricks

I'm trying to generate a list of all S3 files in a bucket/folder. There are usually in the magnitude of millions of files in the folder. I use boto right now and it's able to retrieve around 33k files per minute, which for even a million files,…

python apache-spark pyspark databricks aws-databricks

asked Nov 10 '21 at 22:09

CodingInCircles

2,565
11
59
84

4

votes

0 answers

Databricks Stream to Batch process

I am using Databricks and I am enjoying Autoloader feature. Basically, it is creating infrastructure to consume data in micro batch fashion. It works nice for the initial raw table (or name it bronze). When I am a bit lost how to append my other…

azure apache-spark databricks azure-databricks aws-databricks

asked Oct 18 '21 at 19:24

Dmitry Anoshin

71
5

4

votes

1 answer

Why does Databricks only plot 1000 rows?

Is there any way in Databricks to plot more than 1000 rows with the built in visualization? I tried using limit() function, but it still shows only the first 1000.

databricks azure-databricks aws-databricks gcp-databricks

asked Jul 07 '21 at 14:19

JAdel

1,309
1
7
24

4

votes

1 answer

Databricks Magic Sql - Export Data

Is it possible to export the output of a "magic SQL" command cell in Databricks? I like the fact that one doesn't have to escape the SQL command and it can be easily formatted. But, I cant seem to be able to use the output in other cells. What I…

databricks azure-databricks aws-databricks

asked Feb 03 '21 at 00:41

Raj Rao

8,872
12
69
83

4

votes

2 answers

Calling Trigger once in Databricks to process Kinesis Stream

I am looking a way to trigger my Databricks notebook once to process Kinesis Stream and using following pattern import org.apache.spark.sql.streaming.Trigger // Load your Streaming DataFrame val sdf =…

scala databricks spark-structured-streaming amazon-kinesis aws-databricks

asked Dec 13 '20 at 22:45

InTheWorldOfCodingApplications

2,526
6
46
87

3

votes

1 answer

Databricks Repo vs Workspace

I noticed that in Databricks, there is a folder section for 'Workspace' and a folder for 'Repos' - as seen below: I have been trying to research online what the difference is, but no luck. It seems as though they serve the same purpose? I am able…

repository databricks azure-databricks repo aws-databricks

asked Jan 08 '23 at 17:16

sqlenthusiast

165
3
12

3

votes

1 answer

Using an expression in a PARTITIONED BY definition in Delta Table

Attempting to load data into Databricks using COPY INTO, I have data in storage (as CSV files) that has the following schema: event_time TIMESTAMP, aws_region STRING, event_id STRING, event_name STRING I wish for the target table to be partitioned…

databricks delta-lake aws-databricks

asked Sep 01 '22 at 11:58

Yuval Itzchakov

146,575
32
257
321

Questions tagged [aws-databricks]