31

Setup:

# Pydantic Models

class TMDB_Category(BaseModel):
    name: str = Field(alias="strCategory")
    description: str = Field(alias="strCategoryDescription")


class TMDB_GetCategoriesResponse(BaseModel):
    categories: list[TMDB_Category]


@router.get(path="category", response_model=TMDB_GetCategoriesResponse)
async def get_all_categories():
    async with httpx.AsyncClient() as client:
        response = await client.get(Endpoint.GET_CATEGORIES)
        return TMDB_GetCategoriesResponse.parse_obj(response.json())

Problem:
Alias is being used when creating a response, and I want to avoid it. I only need this alias to correctly map the incoming data but when returning a response, I want to use actual field names.

Actual response:

{
  "categories": [
    {
      "strCategory": "Beef",
      "strCategoryDescription": "Beef is ..."
    },
    {
      "strCategory": "Chicken",
      "strCategoryDescription": "Chicken is ..."
    }
}

Expected response:

{
  "categories": [
    {
      "name": "Beef",
      "description": "Beef is ..."
    },
    {
      "name": "Chicken",
      "description": "Chicken is ..."
    }
}
Rechu
  • 617
  • 1
  • 4
  • 14
  • 4
    I'm actually observing exactly the opposite behavior that the alias is not used in `.dict()` and `.json()` by default. According to the [documentation](https://pydantic-docs.helpmanual.io/usage/exporting_models/#modeldict) whether they are used depends on the `by_alias` boolean keyword argument. And by default that is a weird default, considering that the author considers aliases as: "a mapping between the names of fields used 'publicly' and the names used in your application. Where publicly means in javascript, in an API, in the file you're parsing etc." – bluenote10 Feb 03 '22 at 14:33
  • 2
    I think you want to use `response_model_by_alias=False` in your path decorator, as mentioned in this answer: https://stackoverflow.com/a/69679104/8031815 – Garrett Motzner Mar 09 '23 at 21:39

6 Answers6

47

Switch aliases and field names and use the allow_population_by_field_name model config option:

class TMDB_Category(BaseModel):
    strCategory: str = Field(alias="name")
    strCategoryDescription: str = Field(alias="description")

    class Config:
        allow_population_by_field_name = True

Let the aliases configure the names of the fields that you want to return, but enable allow_population_by_field_name to be able to parse data that uses different names for the fields.

Hernán Alarcón
  • 3,494
  • 14
  • 16
  • 2
    Considering the setup you've shown. Will I be able to later access these properties in code by alias? For example, after parsing: `r = TMDB_GetCategoriesResponse.parse_obj(response.json())` `print(r.name)` `print(r.description)` or will I be forced to use these awful `r.strCategory` and `r.strCategoryDescription` – Rechu Sep 24 '21 at 06:27
  • @Rechu, as far as I know, there is no direct way to access the values of the fields by alias and you have to use the field names. By direct way I mean without exporting it to a dict for example. Check [issue #565](https://github.com/samuelcolvin/pydantic/issues/565) where the library author explains aliases as "a mapping between the names of fields used 'publicly' and the names used in your application. Where publicly means in javascript, in an API, in the file you're parsing etc.". This does not go well with your scenario because you have two different public names. – Hernán Alarcón Sep 27 '21 at 03:45
  • Does this solution integrate with pylance? – Ed1123 Apr 27 '23 at 23:59
10

Use the Config option by_alias.


from fastapi import FastAPI, Path, Query
from pydantic import BaseModel, Field

app = FastAPI()

class Item(BaseModel):
    name: str = Field(..., alias="keck")

@app.post("/item")
async def read_items(
    item: Item,
):
    return item.dict(by_alias=False)

Given the request:

{
  "keck": "string"
}

this will return

{
  "name": "string"
}
Sean
  • 467
  • 6
  • 12
  • This is the working answer. – rickythefox Mar 20 '23 at 20:31
  • 2
    this will not work if you have `response_model` set in the router's configuration. As OP has, `@router.get(path="category", response_model=TMDB_GetCategoriesResponse) ` the response field names will be different and will cause the it to fail validation while returning response – Faizi May 03 '23 at 10:13
3

An alternate option (which likely won't be as popular) is to use a de-serialization library other than pydantic. For example, the Dataclass Wizard library is one which supports this particular use case. If you need the same round-trip behavior that Field(alias=...) provides, you can pass the all param to the json_field function. Note that with such a library, you do lose out on the ability to perform complete type validation, which is arguably one of pydantic's greatest strengths; however it does, perform type conversion in a similar fashion to pydantic. There are also a few reasons why I feel that validation is not as important, which I do list below.

Reasons why I would argue that data validation is a nice to have feature in general:

  • If you're building and passing in the input yourself, you can most likely trust that you know what you are doing, and are passing in the correct data types.
  • If you're getting the input from another API, then assuming that API has decent docs, you can just grab an example response from their documentation, and use that to model your class structure. You generally don't need any validation if an API documents its response structure clearly.
  • Data validation takes time, so it can slow down the process slightly, compared to if you just perform type conversion and catch any errors that might occur, without validating the input type beforehand.

So to demonstrate that, here's a simple example for the above use case using the dataclass-wizard library (which relies on the usage of dataclasses instead of pydantic models):

from dataclasses import dataclass

from dataclass_wizard import JSONWizard, json_field


@dataclass
class TMDB_Category:
    name: str = json_field('strCategory')
    description: str = json_field('strCategoryDescription')


@dataclass
class TMDB_GetCategoriesResponse(JSONWizard):
    categories: list[TMDB_Category] 

And the code to run that, would look like this:

input_dict = {
  "categories": [
    {
      "strCategory": "Beef",
      "strCategoryDescription": "Beef is ..."
    },
    {
      "strCategory": "Chicken",
      "strCategoryDescription": "Chicken is ..."
    }
  ]
}

c = TMDB_GetCategoriesResponse.from_dict(input_dict)
print(repr(c))
# TMDB_GetCategoriesResponse(categories=[TMDB_Category(name='Beef', description='Beef is ...'), TMDB_Category(name='Chicken', description='Chicken is ...')])

print(c.to_dict())
# {'categories': [{'name': 'Beef', 'description': 'Beef is ...'}, {'name': 'Chicken', 'description': 'Chicken is ...'}]}

Measuring Performance

If anyone is curious, I've set up a quick benchmark test to compare deserialization and serialization times with pydantic vs. just dataclasses:

from dataclasses import dataclass
from timeit import timeit

from pydantic import BaseModel, Field

from dataclass_wizard import JSONWizard, json_field


# Pydantic Models
class Pydantic_TMDB_Category(BaseModel):
    name: str = Field(alias="strCategory")
    description: str = Field(alias="strCategoryDescription")


class Pydantic_TMDB_GetCategoriesResponse(BaseModel):
    categories: list[Pydantic_TMDB_Category]


# Dataclasses
@dataclass
class TMDB_Category:
    name: str = json_field('strCategory', all=True)
    description: str = json_field('strCategoryDescription', all=True)


@dataclass
class TMDB_GetCategoriesResponse(JSONWizard):
    categories: list[TMDB_Category]


# Input dict which contains sufficient data for testing (100 categories)
input_dict = {
  "categories": [
    {
      "strCategory": f"Beef {i * 2}",
      "strCategoryDescription": "Beef is ..." * i
    }
    for i in range(100)
  ]
}

n = 10_000

print('=== LOAD (deserialize)')
print('dataclass-wizard: ',
      timeit('c = TMDB_GetCategoriesResponse.from_dict(input_dict)',
             globals=globals(), number=n))
print('pydantic:         ',
      timeit('c = Pydantic_TMDB_GetCategoriesResponse.parse_obj(input_dict)',
             globals=globals(), number=n))

c = TMDB_GetCategoriesResponse.from_dict(input_dict)
pydantic_c = Pydantic_TMDB_GetCategoriesResponse.parse_obj(input_dict)

print('=== DUMP (serialize)')
print('dataclass-wizard: ',
      timeit('c.to_dict()',
             globals=globals(), number=n))
print('pydantic:         ',
      timeit('pydantic_c.dict()',
             globals=globals(), number=n))

And the benchmark results (tested on Mac OS Big Sur, Python 3.9.0):

=== LOAD (deserialize)
dataclass-wizard:  1.742989194
pydantic:          5.31538175
=== DUMP (serialize)
dataclass-wizard:  2.300118940
pydantic:          5.582638598

In their docs, pydantic claims to be the fastest library in general, but it's rather straightforward to prove otherwise. As you can see, for the above dataset pydantic is about 2x slower in both the deserialization and serialization process. It’s worth noting that pydantic is already quite fast, though.


Disclaimer: I am the creator (and maintener) of said library.

rv.kvetch
  • 9,940
  • 3
  • 24
  • 53
  • 2
    It appears you are the author of dataclass-wizard - it might be advisable to add a disclaimer to that effect, in particular since you're making critical claims about a competing solution. – Seb Dec 28 '22 at 11:07
  • 1
    @Seb good point - can't believe I overlooked that. just added a disclaimer to the post, stating as such. – rv.kvetch Jan 03 '23 at 18:35
3

You need to change alias to have validation_alias.

class TMDB_Category(BaseModel):
    name: str = Field(validation_alias="strCategory")
    description: str = Field(validation_alias="strCategoryDescription")

Serialization alias can be set with serialization_alias. Documentation.

0

maybe you could use this approach

from pydantic import BaseModel, Field


class TMDB_Category(BaseModel):
    name: str = Field(alias="strCategory")
    description: str = Field(alias="strCategoryDescription")


data = {
    "strCategory": "Beef",
    "strCategoryDescription": "Beef is ..."
}


obj = TMDB_Category.parse_obj(data)

# {'name': 'Beef', 'description': 'Beef is ...'}
print(obj.dict())
0

I was trying to do something similar (migrate a field pattern to a list of patterns while gracefully handling old versions of the data). The best solution I could find was to do the field mapping in the __init__ method. In the terms of OP, this would be like:

class TMDB_Category(BaseModel):
    name: str
    description: str
    def __init__(self, **data):
        if "strCategory" in data:
            data["name"] = data.pop("strCategory")
        if "strCategoryDescription" in data:
            data["description"] = data.pop("strCategoryDescription")
        super().__init__(**data)

Then we have:

>>> TMDB_Category(strCategory="name", strCategoryDescription="description").json()
'{"name": "name", "description": "description"}'

If you need to use field aliases to do this but still use the name/description fields in your code, one option is to alter Hernán Alarcón's solution to use properties:

class TMDB_Category(BaseModel):
    strCategory: str = Field(alias="name")
    strCategoryDescription: str = Field(alias="description")
    class Config:
        allow_population_by_field_name = True
    @property
    def name(self):
        return self.strCategory
    @name.setter
    def name(self, value):
        self.strCategory = value
    @property
    def description(self):
        return self.strCategoryDescription
    @description.setter
    def description(self, value):
        self.strCategoryDescription = value

That's still a bit awkward, since the repr uses the "alias" names:

>>> TMDB_Category(name="name", description="description")
TMDB_Category(strCategory='name', strCategoryDescription='description')
Lucas Wiman
  • 10,021
  • 2
  • 37
  • 41