Set-up
Using Scrapy, I am scraping housing ads. Per housing ad, I obtain a postal code.
I have a dictionary linking postal codes to districts,
postal_district = {'A': ['1011AB', '1011BD', '1011BG', '1011CE',
'1011CH', '1011CZ', '1011DB', '1011DD']}
The entire dictionary can be viewed here.
Each two subsequent postal codes in the list form a range – the first postal code is the min of the range, the second postal code is the max.
E.g. any postal code in
'1011AB', '1011AC',...,'1011AZ', '1011BA',...,'1011BD'
belongs to district 'A'
.
My goal is to match ads to districts via their postal code and the dictionary.
Problem
I've asked a preceding question here and have chosen to follow this answer to solve the issue.
Thus, I am using the following code to match the ads to districts,
def is_in_postcode_range(current_postcode, min, max):
return min <= current_postcode <= max
def get_district_by_post_code(postcode):
for district, codes in postal_district.items():
first_code = codes[0]
last_code = codes[-1]
if is_in_postcode_range(postcode, first_code, last_code):
if any(is_in_postcode_range(postcode, codes[i], codes[i+1]) for i in range(0, len(codes), 2)):
return district
else:
return None
district = get_district_by_post_code(pc)
For some postal codes this works. However, many postal codes are not matched.
1035CK
, 1072LL
, 1059EC
are unmatched, to name a few.
What is wrong? Is it the dictionary or the code?
I've sorted the dictionary.