Background:
I scrape data from 2 sources for upcoming properties for sale, lets call one SaleAnnouncement
, and the other SellerMaintainedData
. They share many of the same field names (although some data can only be found in one and not the other). If an item is coming up for sale, there is guaranteed to be a SaleAnnouncement
, but not necessarily SellerMaintainedData
. In fact only about 10% of the "sellers" maintain there own site with relevant data. However those that do, always have more information and that data is more up to date than the data in the announcement. Also, the "announcement" is free form text which needs to go through several processing steps before the relevant data is extracted and as such, the model has some fields to store data in intermediate steps of processing (part of the reason I opted for 2 models as opposed to combining them into 1), while the "seller" data is scraped in a neat tabular format.
Problem:
I would ultimately like to combine them into one SaleItem
and have implemented a model which is related to the previous 2 models and relies heavily on properties to prioritize which model the data comes from. Something like:
@property
def sale_datetime(self):
if self.sellermaintaineddata and self.sellermaintaineddata.sale_datetime:
return self.trusteeinfo.sale_datetime
else:
return self.latest_announcement and self.latest_announcement.sale_datetime
However I obviously won't be able to query those fields, which would be my end goal when listing upcoming sales. I have been suggested a solution which involves creating a custom manager which overrides the filter/exclude methods, which sounds promising but I would have to duplicate all the property field logic in the model manager.
Summary (for clarity)
I have:
class SourceA(Model):
sale_datetime = ...
address = ...
parcel_number = ...
# other attrs...
class SourceB(Model):
sale_datetime = ...
address = ...
# no parcel number here
# other attrs...
I want:
class Combined(Model):
sale_datetime = # from sourceB if sourceB else from sourceA
...
I want a unified model where common fields between SourceA
and SourceB
are prioritized so that if SourceB
exists it derives the value of that field from SourceB
or else it comes from SourceA
. I would also like to query those fields so maybe using properties is not the best way...
Question
Is there a better way, should I consider restructuring my models (possibly combining those 2), or is the custom manager solution the way to go?