1

Is there a way to create an empty pandas dataframe from a pandera schema?

Given the following schema, I would like to get an empty dataframe as shown below:

from pandera.typing import Series, DataFrame

class MySchema(pa.DataFrameModel):
    state: Series[str]
    city: Series[str]
    price: Series[int]

def get_empty_df_of_schema(schema: pa.DataFrameModel) -> pd.DataFrame:
    pass

wanted_result = pd.DataFrame(
    columns=['state', 'city', 'price']
).astype({'state': str, 'city': str, 'price': int})
wanted_result.info()

Desired result:

Index: 0 entries
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   state   0 non-null      object
 1   city    0 non-null      object
 2   price   0 non-null      int64 

Edit:

Found a working solution:

def get_empty_df_of_pandera_model(model: [DataFrameModel, MetaModel]) -> pd.DataFrame:
    schema = model.to_schema()
    column_names = list(schema.columns.keys())
    data_types = {column_name: column_type.dtype.type.name for column_name, column_type in schema.columns.items()}
    return pd.DataFrame(columns=column_names).astype(data_types)
MJA
  • 357
  • 2
  • 5
  • 10

2 Answers2

1

The current pandera docs have small section on pandas data types

This suggests the following solution:

import pandera as pa
import pandas as pd

def empty_dataframe_from_model(Model: pa.DataFrameModel) -> pd.DataFrame:
    schema = Model.to_schema()
    return pd.DataFrame(columns=schema.dtypes.keys()).astype(
        {col: str(dtype) for col, dtype in schema.dtypes.items()}
    )
camo
  • 422
  • 4
  • 9
  • This might not work. For example, for schema column defined as `foo: datetime.date`, `astype()` raises an exception of unknown type `date`. The solution in the question using `dtype.type.name` does work. – vvv444 Aug 14 '23 at 12:55
-1

Yes, it is possible to create empty pandas dataframe using pandera schema with the help of the function schema.to_dataframe().

Here is the updated version of the function get_empty_df_of_schema

def get_empty_df_of_schema(schema: pa.DataFrameModel) -> pd.DataFrame:
    row_empty = schema({}).astype(str).iloc[0]
    return pd.DataFrame(columns=row_empty.index).astype(row_empty.to_dict())

Also, have a look at dataframes schemas through the following link

Nimra Tahir
  • 391
  • 1
  • 6