4

I have the following class

@dataclass_json
@dataclass
class Source:
    type: str =None
    label: str =None
    path: str = None

and the two subclasses:

@dataclass_json
@dataclass
class Csv(Source):
    csv_path: str=None
    delimiter: str=';'

and

@dataclass_json
@dataclass
class Parquet(Source):
    parquet_path: str=None

Given now the dictionary:

parquet={type: 'Parquet', label: 'events', path: '/.../test.parquet', parquet_path: '../../result.parquet'}
csv={type: 'Csv', label: 'events', path: '/.../test.csv', csv_path: '../../result.csv', delimiter:','}

Now I would like to do something like

Source().from_dict(csv) 

and that the output will be the class Csv or Parquet. I understand that if you initiate the class source you just "upload" the parameters with the method "from dict", but is there any posibility in doing this by some type of inheritence without using a "Constructor" which makes a if-else if-else over all possible 'types'?

Pureconfig, a Scala Library, creates different case classes when the attribute 'type' has the name of the desired subclass. In Python this is possible?

Patricio
  • 253
  • 4
  • 14
  • 1
    What have you tried so far? Why don't you want to use if-else? Do you need to support other types? Is the ``type`` field always the name of the target class, or can these differ? – MisterMiyagi Apr 21 '20 at 09:18
  • Not necessarily a duplicate, but a more generic version of this same question: https://stackoverflow.com/questions/7273568/pick-a-subclass-based-on-a-parameter – Billy Apr 21 '20 at 09:30
  • Your classes are marked with the third-party decorator ``dataclass_json``, but your usage example does not use its functionality. Do you need a solution that works for any ``dataclass`` (loaded from a dict) or do you actually need the JSON functionality (loading from a JSON)? – MisterMiyagi Apr 21 '20 at 09:31

3 Answers3

3

You can build a helper that picks and instantiates the appropriate subclass.

def from_data(data: dict, tp: type):
    """Create the subtype of ``tp`` for the given ``data``"""
    subtype = [
        stp for stp in tp.__subclasses__()  # look through all subclasses...
        if stp.__name__ == data['type']     # ...and select by type name
    ][0]
    return subtype(**data)  # instantiate the subtype

This can be called with your data and the base class from which to select:

>>> from_data(
...     {'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','},
...     Source,
... )
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')

If you need to run this often, it is worth building a dict to optimise the subtype lookup. A simple means is to add a method to your base class, and store the lookup there:

@dataclass_json
@dataclass
class Source:
    type: str =None
    label: str =None
    path: str = None

    @classmethod
    def from_data(cls, data: dict):
        if not hasattr(cls, '_lookup'):
            cls._lookup = {stp.__name__: stp for stp in cls.__subclasses__()}
        return cls._lookup[data["type"]](**data)

This can be called directly on the base class:

>>> Source.from_data({'type': 'Csv', 'label': 'events', 'path': '/.../test.csv', 'csv_path': '../../result.csv', 'delimiter':','})
Csv(type='Csv', label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')
MisterMiyagi
  • 44,374
  • 10
  • 104
  • 119
  • Thank you very much. The only thing is that I do not see how this is compartible with @dataclass_json. It does not select the subclass. – Patricio Apr 29 '20 at 06:52
  • @Patricio Can you clarify what you mean by "compatible with @dataclass_json"? In how for do you think this is not compatible? Which subclass is not selected? – MisterMiyagi Apr 29 '20 at 08:02
  • What I mean is the - to be honest - very special case where you have another dataclass X where one field , let us say, called 'sources' is of type List[Sources]. In this case when I apply the dictionary to X, it does not take all variables in source (just type, label, path but not the other like parquet_path when it is of type 'Parquet'. – Patricio Apr 29 '20 at 09:30
  • @Patricio Sorry, I don't follow this description at all. The code shown does copy the entire ``data`` unmodified to the subtype; any variable but ``type`` is treated exactly the same. Do you want to *recursivley* resolve some data, perhaps? Is ``X`` a ``Source`` as well? – MisterMiyagi Apr 29 '20 at 09:38
  • Yes I want it recursively. What I mean is that X is a dataclass and one of its fields is of type List[Source]. If you still do not know what I mean I will make an example in a edit of my question. – Patricio Apr 29 '20 at 09:54
  • @Patricio I think I know what you mean, but the further constraints are unclear to me. Does ``X`` have *only* the field ``source: List[Source]`` or others as well? Also, in how far is that a new approach instead of merely extending the approach shown here? If this new requirement shifts the focus of the question, consider opening a new one. – MisterMiyagi Apr 29 '20 at 10:04
  • Thank you very much. I made the new question: https://stackoverflow.com/questions/61541259/dictionary-to-dataclasses-with-inheritance-of-classes. Hope you can help me. – Patricio May 01 '20 at 11:08
3

This is a variation on my answer to this question.

@dataclass_json
@dataclass
class Source:
    type: str = None
    label: str = None
    path: str = None

    def __new__(cls, type=None, **kwargs):
        for subclass in cls.__subclasses__():
            if subclass.__name__ == type:
                break
        else:
            subclass = cls
        instance = super(Source, subclass).__new__(subclass)
        return instance

assert type(Source(**csv)) == Csv
assert type(Source(**parquet)) == Parquet
assert Csv(**csv) == Source(**csv)
assert Parquet(**parquet) == Source(**parquet)

You asked and I am happy to oblige. However, I'm questioning whether this is really what you need. I think it might be overkill for your situation. I originally figured this trick out so I could instantiate directly from data when...

  • my data was heterogeneous and I didn't know ahead of time which subclass was appropriate for each datum,
  • I didn't have control over the data, and
  • figuring out which subclass to use required some processing of the data, processing which I felt belonged inside the class (for logical reasons as well as to avoid polluting the scope in which the instantiating took place).

If those conditions apply to your situation, then I think this is a worth-while approach. If not, the added complexity of mucking with __new__ -- a moderately advanced maneuver -- might not outweigh the savings in complexity in the code used to instantiate. There are probably simpler alternatives.

For example, it appears as though you already know which subclass you need; it's one of the fields in the data. If you put it there, presumably whatever logic you wrote to do so could be used to instantiate the appropriate subclass right then and there, bypassing the need for my solution. Alternatively, instead of storing the name of the subclass as a string, store the subclass itself. Then you could do this: data['type'](**data)

It also occurs to me that maybe you don't need inheritance at all. Do Csv and Parquet store the same type of data, differing only in which file format they read it from? Then maybe you just need one class with from_csv and from_parquet methods. Alternatively, if one of the parameters is a filename, it would be easy to figure out which type of file parsing you need based on the filename extension. Normally I'd put this in __init__, but since you're using dataclass, I guess this would happen in __post_init__.

ibonyun
  • 425
  • 3
  • 11
  • Thank you very much for your answer. Yesterday I just figured out that my problem is actually a little bit more covoluted. I made another question for this: https://stackoverflow.com/questions/61541259/dictionary-to-dataclasses-with-inheritance-of-classes. – Patricio May 01 '20 at 11:07
  • @Patricio You say in your new question that you get stuck when you try to find the subclasses of `Source`. My solution shows you how to do this. MisterMiyagi's answer does too. Did you actually try either of our solutions? What happened when you did? How are they not solving your problem? – ibonyun May 01 '20 at 22:09
  • Yeah, both of your answer works for this problem. So thank you again. But I notices that when Source is a subclass of another class the inheritence behaviour of the library dataclass_json does not work correctly for my case. – Patricio May 02 '20 at 07:47
0

Do you need this behavior?

from dataclasses import dataclass
from typing import Optional, Union, List

from validated_dc import ValidatedDC


@dataclass
class Source(ValidatedDC):
    label: Optional[str] = None
    path: Optional[str] = None


@dataclass
class Csv(Source):
    csv_path: Optional[str] = None
    delimiter: str = ';'


@dataclass
class Parquet(Source):
    parquet_path: Optional[str] = None


@dataclass
class InputData(ValidatedDC):
    data: List[Union[Parquet, Csv]]


# Let's say you got a json-string and loaded it:
data = [
    {
        'label': 'events', 'path': '/.../test.parquet',
        'parquet_path': '../../result.parquet'
    },
    {
        'label': 'events', 'path': '/.../test.csv',
        'csv_path': '../../result.csv', 'delimiter': ','
    }

]


input_data = InputData(data=data)

for item in input_data.data:
    print(item)

# Parquet(label='events', path='/.../test.parquet', parquet_path='../../result.parquet')
# Csv(label='events', path='/.../test.csv', csv_path='../../result.csv', delimiter=',')

validated_dc: https://github.com/EvgeniyBurdin/validated_dc

Evgeniy_Burdin
  • 627
  • 5
  • 14