0

I have a dataset containing historical transaction records for real estate properties. Each property has an ID number. To check if the data is complete, for each property I am identifying a "transaction chain": I take the original buyer, and go through all intermediate buyer/seller combinations until I reach the final buyer of record. So for data that looks like this:

Buyer|Seller|propertyID
Bob|Jane|23
Tim|Bob|23
Karl|Tim|23

The transaction chain will look like: [Jane, Bob, Tim, Karl]

I am using three datasets to do this. The first contains the names of only the first buyer of each property. The second contains the names of all intermediate buyers and sellers, and the third contains only the final buyer for each property. I use three datasets so I can follow the process given by vikramls answer here.

In my version of the graph dictionary, each seller is a key to its corresponding buyer, and the oft-cited find_path function finds the path from first seller to last buyer. The problem is that the dataset is very large, so I get a maximum recursion depth reached error. I think I can solve this by nesting the graph dictionary inside another dictionary where they key is the property id number, and then searching for the path within ID groups. However, when I tried:

graph = {}
propertyIDgraph = {}

with open('buyersAndSellers.txt','r') as f:
    for row in f:
        propertyid, seller, buyer = row.strip('\n').split('|')
        graph.setdefault(seller, []).append(buyer)
        propertyIDgraph.setdefault(propertyid, []).append(graph)
f.close()

It assigned every buyer/seller combination to every property id. I would like it to assign the buyers and sellers to only their corresponding property ID.

Community
  • 1
  • 1
  • Do you want specifically to do this with dictionaries or would you be open to a library? I will point out that your current approach will fail given a circuit in the buying/selling, perhaps your business domain doesn't allow this. – bearrito Aug 04 '15 at 18:59
  • I am open to a library. I used a dictionary because it was the first thing I found information on when I researched paths between nodes in python – public_displYname Aug 04 '15 at 19:02
  • You probably have a circuit somewhere in the model (as mentioned)... e.g. Karl sold to Jane, who sold to Bob, who sold to Tim, who sold it back to Jane, who sold to Rick. When you get to Jane, you don't know whether to walk to Bob, or to Rick, so if you choose Bob, you'll continue walking around in a circle forever. Without knowing anything about your code at all, this would be the first thing to check. A graph might not be a good choice for this; better to just have a list that you append to, i.e. a defaultdict(list) might be a better model. – Corley Brigman Aug 04 '15 at 19:49
  • Yes circuits do exist. So I can use defaultdict(list) to create the dictionary but the find_path function would work as it is currently written? In other words the only thing that would need to change is the graph structure? – public_displYname Aug 04 '15 at 21:22

2 Answers2

0

You might attempt to something like the following. I adapted from the link at https://www.python.org/doc/essays/graphs/

Transaction = namedtuple('Transaction', ['Buyer', 'PropertyId'])

graph = {}
## maybe this is a db or a file
for data in datasource:
    graph[data.seller] = Transaction(data.buyer,data.property_id)

## returns something like
## graph = {'Jane': [Transaction('Bob',23)],
##        'Bob': [Transaction('Tim',23)],
##        'Time': [Transaction('Karl',23)]}
##

def find_transaction_path(graph,  original_seller,current_owner,target_property_id path=[]):
    assert(target_property_id is not None)

    path = path + [original_seller]
    if start == end:
        return path
    if not graph.has_key(original_seller):
        return None
    shortest = None
    for node in graph[start]:
        if node not in path and node.property_id == target_property_id:
            newpath = find_shortest_path(graph, node.Buyer, current_owner, path,target_property_id)
            if newpath:
                if not shortest or len(newpath) < len(shortest):
                    shortest = newpath
    return shortest
bearrito
  • 2,217
  • 1
  • 25
  • 36
  • Could you expand a little bit on how to get the information from the columns into that namedtuple and graph format without hardcoding it? Also, I don't see where the propertyID is used in the function, i.e. where does the function to say only iterate within the ID group? – public_displYname Aug 04 '15 at 21:06
-1

I wouldn't recommend append to a graph. It will append to every node. Better check if exists first than right after append it to the already existed object.

Try this:

graph = {}
propertyIDgraph = {}

with open('buyersAndSellers.txt','r') as f:
    for row in f:
        propertyid, seller, buyer = row.strip('\n').split('|')
        if seller in graph.iterkeys() :
            graph[seller] = graph[seller] + [buyer]
        else:
            graph[seller] = [buyer]
        if propertyid in propertyIDgraph.iterkeys():
            propertyIDgraph[propertyid] = propertyIDgraph[propertyid] + [graph]
        else:
            propertyIDgraph[propertyid] = [graph]
f.close()

Here a link that maybe will be usefull:

syntax for creating a dictionary into another dictionary in python

Community
  • 1
  • 1
Dinidiniz
  • 771
  • 9
  • 15
  • Thanks for your response. I am using python 3 so I changed .iterkeys() to .items(), but using this method still includes every buyer-seller pair in every property ID. – public_displYname Aug 04 '15 at 19:16
  • I think in python3 iterkeys() is just keys(). In python 2 is working fine, sorry, but i dont know for python3. – Dinidiniz Aug 04 '15 at 19:49