10

I am looking for a way to write back to a delta table in python without using pyspark. I know there is a library called deltalake/delta-lake-reader that can be used to read delta tables and convert them to pandas dataframes.

The goal is to write back to the opened delta table

The input code looks like this:

from deltalake import DeltaTable
dt = DeltaTable('path/file')
df = dt.to_pandas()

So is there any way to get something like this to write from a pandas dataframe back to a delta table:

df = pandadf.to_delta()
DeltaTable.write(df, 'path/file')

Thank you for your assistance!

FRITTENPIET
  • 101
  • 1
  • 4
  • it's not yet possible if you look into features matrix: https://github.com/delta-io/delta-rs#features – Alex Ott Oct 24 '21 at 13:08

2 Answers2

12

Now it is supported !!!, see this example

import duckdb 
from deltalake.writer import write_deltalake
df =duckdb.sql('''
LOAD 'httpfs';
SELECT countries_and_territories, sum(deaths) as total FROM 
read_parquet('https://pandemicdatalake.blob.core.windows.net/public/curated/covid-19/ecdc_cases/latest/ecdc_cases.parquet')
group by 1
order by total desc
limit 5;
''').df()
write_deltalake('Pathto/covid', df,mode='append')
Mim
  • 999
  • 10
  • 32
  • Mim is correct! To provide additional context, they are using the [`delta-rs`](https://delta-io.github.io/delta-rs/python/) library which does not have a spark dependency. You can install delta-rs with pip or conda: `$ pip install deltalake` or `$ conda install -c conda-forge delta-spark` – Jim Hibbard Mar 30 '23 at 23:00
1

@Mim is correct. This just provides more info.

Currently, you can use delta-rs to read and write to Delta Lake directly.

You can install by pip install deltalake or conda install -c conda-forge delta-spark.

import pandas as pd
from deltalake.writer import write_deltalake

df = pd.DataFrame({"x": [1, 2, 3]})
write_deltalake("path/to/delta-tables/table1", df)

Writing to S3

storage_options = {
    "AWS_DEFAULT_REGION": "us-west-2",
    "AWS_ACCESS_KEY_ID": "xxx",
    "AWS_SECRET_ACCESS_KEY": "xxx",
    "AWS_S3_ALLOW_UNSAFE_RENAME": "true",
}

write_deltalake(
    "s3a://my-bucket/delta-tables/table1",
    df,
    mode="append",
    storage_options=storage_options,
)

To remove AWS_S3_ALLOW_UNSAFE_RENAME and concurrently write, needs to set up DynamoDB lock.

Follow this GitHub ticket for more updates regarding how to set up correctly.

Hongbo Miao
  • 45,290
  • 60
  • 174
  • 267