1

I have a dataframe in pandas and I need to transfer it to delta format without using spark.

I searched a lot about this but I didn't find any solution that doesn't use spark.

import pandas as pd

df = pd.DataFrame({"x": [1, 2, 3],"y": [3, 2, 1]})
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • 3
    Does this answer your question? [How to write to delta table/delta format in Python without using Pyspark?](https://stackoverflow.com/questions/69407302/how-to-write-to-delta-table-delta-format-in-python-without-using-pyspark) – s_pike Mar 15 '23 at 13:58
  • this command returned me a parquet file and a folder called _delta_log with a json inside – Lucca Ribeiro Mar 15 '23 at 14:24
  • 1
    Delta table is not a file format, it a layer which stores the data in parquet format and keys track of all the action of the data in a log folder. – arudsekaberne Mar 15 '23 at 14:26
  • I can confirm that parquet with a json is the structure of data in delta format - arudsekaberne is correct. – s_pike Mar 15 '23 at 14:33

1 Answers1

2

The delta-rs library does not have a spark dependency. You can save your pandas DataFrame as a Delta Table the following way:

import pandas as pd
from deltalake.writer import write_deltalake

df = pd.DataFrame({"x": [1, 2, 3],"y": [3, 2, 1]})
write_deltalake('/path/to/save/delta/table', df)

To install the delta-rs library, you can use pip:

$ pip install deltalake

Or you can use conda:

$ conda install -c conda-forge delta-spark
Jim Hibbard
  • 205
  • 1
  • 6