2

Set-up

Using Scrapy, I am scraping housing ads. Per housing ad, I obtain a postal code.

I have a dictionary linking postal codes to districts,

postal_district = {'A': ['1011AB', '1011BD', '1011BG', '1011CE',
                         '1011CH', '1011CZ', '1011DB', '1011DD']}

The entire dictionary can be viewed here.

Each two subsequent postal codes in the list form a range – the first postal code is the min of the range, the second postal code is the max.

E.g. any postal code in

'1011AB', '1011AC',...,'1011AZ', '1011BA',...,'1011BD'

belongs to district 'A'.

My goal is to match ads to districts via their postal code and the dictionary.


Problem

I've asked a preceding question here and have chosen to follow this answer to solve the issue.

Thus, I am using the following code to match the ads to districts,

def is_in_postcode_range(current_postcode, min, max):
     return min <= current_postcode <= max

def get_district_by_post_code(postcode):
     for district, codes in postal_district.items():
         first_code = codes[0]
         last_code = codes[-1]
         if is_in_postcode_range(postcode, first_code, last_code):
             if any(is_in_postcode_range(postcode, codes[i], codes[i+1]) for i in range(0, len(codes), 2)):
                 return district
             else:
                 return None

district = get_district_by_post_code(pc)   

For some postal codes this works. However, many postal codes are not matched. 1035CK, 1072LL, 1059EC are unmatched, to name a few.


What is wrong? Is it the dictionary or the code?

I've sorted the dictionary.

Community
  • 1
  • 1
LucSpan
  • 1,831
  • 6
  • 31
  • 66

1 Answers1

1

This construct:

 if is_in_postcode_range(postcode, first_code, last_code):
     if any(is_in_postcode_range(postcode, codes[i], codes[i+1]) 
             for i in range(0, len(codes), 2)):
         return district
     else:
         return None

Assumes that the postal districts have no overlapping ranges. If that is not true, then you need to remove:

else:
    return None
Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135