Questions tagged [data-lakehouse]
28 questions
13
votes
1 answer
lakeFS, Hudi, Delta Lake merge and merge conflicts
I'm reading documentation about lakeFS and right now don't clearly understand what is a merge or even merge conflict in terms of lakeFS.
Let's say I use Apache Hudi for ACID support over a single table. I'd like to introduce multi-table ACID support…

alexanoid
- 24,051
- 54
- 210
- 410
3
votes
1 answer
Creating a table in Pyspark within a Delta Live Table job in Databricks
I am running a DLT (Delta Live Table) Job that creates a Bronze table > Silver Table for two separate tables. So in the end, I have two separate gold Tables which I want to be merged into one table. I know how to do it in SQL but every time I run…

Anton Kopti
- 31
- 2
2
votes
1 answer
External vs Internal table in Delta Lake
Are there any performance benefits of the Internal table in Delta Lake compared to External Table as in both cases the source files reside in Data Lake?

Su1tan
- 45
- 5
2
votes
1 answer
Do you store data in the Delta Lake Silver layer in a normalized format or do you derive it?
I am currently setting up a data lake trying to follow the principles of Delta Lake (landing in bronze, cleaning and merging into silver, and then, if needed, presenting the final view in gold) and have a question about what should be stored in…

Martin Zugschwert
- 23
- 3
2
votes
0 answers
how are Updates and Deletes handled in both Data Warehouses and Data Lakes?
I'm trying to understand how Update and Delete functions are performed in Data Warehouses, Lakes and Lakehouses.
Databricks argues that they can perform upserts easily, which I would understand as adding CRUD capabilities.
I've read elsewhere that…

Gonzalo Etse
- 123
- 1
- 7
2
votes
1 answer
Delta files, delta tables and delta transactions
I have a serious issue understanding Delta tables, delta transaction logs, and delta Files.
Questions:
What and where are the delta tables. I don't understand if they are in the metastore(hive), in object-store (s3) or in both.
What and where are…

Gonzalo Etse
- 123
- 1
- 7
1
vote
1 answer
Data Vault: Relationship between LINK and SAT on Historical analysis using SQL
In Data Vault Model, we have below tables:
Details on how LINKTradeinVehicle and SAT_Order are inserted below:
Problem statement:
We need to know historically which data between LINK and SAT tables on the basis of LINK's orderkey and vehicleKEY…

Anonymous
- 11
- 2
1
vote
2 answers
How to add a new column when writing to a Delta table?
I am using delta-rs to write to a Delta table in the Delta Lake. Here is my code:
import time
import numpy as np
import pandas as pd
import pyarrow as pa
from deltalake.writer import write_deltalake
num_rows = 10
timestamp = np.array([time.time() +…

Hongbo Miao
- 45,290
- 60
- 174
- 267
1
vote
1 answer
Upserts on Delta simply duplicates data?
I'm fairly new with Delta and lakehouse on databricks. I have some questions, based on the following actions:
I import some parquet files
Convert them to delta (creating 1 snappy.parquet file)
Delete one random row (creating 1 new snappy.parquet…

Gonzalo Etse
- 123
- 1
- 7
0
votes
1 answer
What is the difference between a data lakehouse and a delta lake?
I am new to Databricks. I am reading Microsoft documentation on data lakehouse. In the documentation they make reference to delta lake without explaining what the difference is or even if there is any. Can someone please help explain this to me. Any…

Jay2454643
- 15
- 4
0
votes
0 answers
Microsoft Fabric Lakehouse size and table sizes
We have created a Lakehouse on Microsoft Fabric. It has a bunch of tables and files.
In the Lakehouse explorer, I can see the files sizes just by clicking on the relevant folder or file in 'Files'.
But, I want to know the size of each table in the…

user18366639
- 11
- 2
0
votes
0 answers
Backup Microsoft fabric and prevent easy artifact deletion
We are considering to move our "classical" SQL data warehouse over to microsoft fabric.
I noticed two things which both are possible showstoppers:
I cant find any information on backups. How can we backup a fabric workspace (or at least a fabric…

marritza
- 22
- 5
0
votes
0 answers
Refreshing Lakehouse data during a notebook session in Microsoft Fabric
I am running a pyspark script in a notebook in Microsoft Fabric (preview).
The script gets the last modification time of test.csv, which is located in a lakehouse in the same workspace.
The problem is, as soon as you start the session of the…
0
votes
0 answers
Fabric Lakehouse PowerBI report: Couldn't load the data for this visual
We have created a lakehouse on microsoft fabric and put a PowerBI report on it, in direct lake mode.
The report is based on a dataset thats based on the SQL endpoint of the lakehouse:
When created first time, everything works fine. But after first…

marritza
- 22
- 5
0
votes
0 answers
what is the benefit of using delta lake or iceberg table format?
We currently store data on S3 using parquet format, and use AWS Glue data catalog to store table metadata. We add partitions by dates or hours. Most of queries that we have are read-only queries. I am wondering the benefits that we can get from…

yuyang
- 1,511
- 2
- 15
- 40