1

I have a list of products which contains many objects having properties like id,image_url. As you can see below:

total_products

[{u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQCG1ObwtCgqxZIk&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000000.png&cfs=1&_nc_hash=AQAPdo31zo9WJk8j', u'id': u'1539966686030963', u'retailer_id': u'product-1000000'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQDyc-Yyic5QLOqH&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F0.png&cfs=1&_nc_hash=AQDhmhPJxFZEpMFX', u'id': u'993388404100117', u'retailer_id': u'product-0'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQAwTzrzAjdKFjmB&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000.png&cfs=1&_nc_hash=AQCMMJRJ_r7QB06I', u'id': u'642820939176165', u'retailer_id': u'product-1000'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBHdbRqB7F6aMKM&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1.png&cfs=1&_nc_hash=AQDx7P52g0NYBB-3', u'id': u'1411912028843607', u'retailer_id': u'product-1'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB7aSPmk_j21umz&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F100000.png&cfs=1&_nc_hash=AQAPV5oe_ymaAcXr', u'id': u'942522339181104', u'retailer_id': u'product-100000'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB69V2cgASUIci1&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F100.png&cfs=1&_nc_hash=AQAk3eZ4vqWYbOW4', u'id': u'1347112758661660', u'retailer_id': u'product-100'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQD44rjEUMk6Yp2H&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000001.png&cfs=1&_nc_hash=AQBT_0iB417B08ux', u'id': u'1354204821311003', u'retailer_id': u'product-1000001'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQB4ucqXEbo2DyC7&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F1000002.png&cfs=1&_nc_hash=AQAQ2vuj0WmuXSqw', u'id': u'1776841739008769', u'retailer_id': u'product-1000002'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQBM75VZTNuxqaoq&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F10.png&cfs=1&_nc_hash=AQAUdkc6II5eu47D', u'id': u'1358784964179738', u'retailer_id': u'product-10'}, {u'image_url': u'https://external.xx.fbcdn.net/safe_image.php?d=AQAY0kmVnHXBbhHe&url=http%3A%2F%2Fgigya.jp%2Fdpa%2F10000.png&cfs=1&l&_nc_hash=AQCT1PHl5h1Rhc5r', u'id': u'1337513966312571', u'retailer_id': u'product-10000'}]

I am reading a csv file which contains data like below;-

csv_file_data: enter image description here

As you can see id the in the csv_file id and retailer_id are same for some products- so I want to change the image_link in csv file if retailer_id and id matches.

In doing so I am reading csv file row by row and looping through all the products in total_products and if any match is found change image_link

Code:

def update_csv(file): 
    print file
    reader  = csv.DictReader(open(file))
    out_file_name = str(file).replace(".csv", "")
    writer = csv.DictWriter(open(out_file_name+"_updated.csv","wb"),fieldnames=reader.fieldnames)
    writer.writeheader()
    for current_row in reader:
        for product in total_products:
            retailer_id = product['retailer_id']
            if(current_row['id']==retailer_id):
                current_row['image_link']= "RajSharma"
                print "Match = "+str(retailer_id)+" in "+file
                break   
        writer.writerow(current_row)

The problem with this approach is if total_products contains more than 1000-10,000 it's taking too long to run.

Is there way to find retailer_id in total_products and if so change image_link?

Samuel Liew
  • 76,741
  • 107
  • 159
  • 260
RajSharma
  • 1,941
  • 3
  • 21
  • 34
  • Convert `total_products` into a dictionary, so you don't have to loop over it for every line. – Barmar Jan 19 '17 at 02:16
  • @Barmar can you give some example how to covert and use.. – RajSharma Jan 19 '17 at 02:18
  • See http://stackoverflow.com/questions/12586179/convert-list-of-dictionaries-to-nested-dictionary – Barmar Jan 19 '17 at 02:27
  • After you convert it, just replace the loop with `found_product = products_dict[retailer_id]` – Barmar Jan 19 '17 at 02:28
  • @Barmar Converting to a dict isn't needed if the user just wants to check for values against a shared key. Converting to a set of just the ID's would suffice. Only if the user needed to know other information in the dict, would converting to a large dict be necessary. – the_constant Jan 19 '17 at 02:35
  • @Vincenzzzochi Good point, I thought he was pulling something from the objects. But a set is essentially just a dictionary with no values. – Barmar Jan 19 '17 at 02:37
  • I suggest that profile your script and see where it's spending most of its time and then try to optimize that portion. See [_How can you profile a script?_](http://stackoverflow.com/questions/582336/how-can-you-profile-a-script) – martineau Jan 19 '17 at 02:44
  • @martineau I'll try to profile script from next time onward. Thanks – RajSharma Jan 19 '17 at 02:57

1 Answers1

3

First, create a set of ids from total_products:

product_ids = set([product['retailer_id'] for product in total_products])

Then, check if the current_row['id'] is in the set:

for current_row in reader:
    if current_row['id'] in product_ids:
        current_row['image_link'] = 'RajSharma'

A set is much faster to search through, and we only need a list of unique product ID's to check against. Using if current_row['image_link'] in product_ids leverages the underlying C code for looping which optimizes checking for a value in the set.

the_constant
  • 681
  • 4
  • 11