Stopping a looping script from returning duplicate entries

Question

I have code which currently prints out data for each user from an XML file (obtained from a website) the XML updates as more users interact with it throughout the day. I currently have my code looping to download this data every 5 minutes.

Every time the code is ran it generates a list of users and their statistics, first 5 mins it prints users: a,b,c

second 5 mins it prints users : a,b,c,d,e

third 5 mins it prints users : a,b,c,d,e,f,g

What i need the code to do it to print first 5 mins: a,b,c second 5 mins: d,e third 5 mins: f,g

Some how recognising that some of the users have already been used. Each user does have a unique user id which i guess could be matched?

I enclose an example of my code, in case that helps.

import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep

######Code to loop the script and set up scheduling time

s = scheduler(time, sleep)
random.seed()

def run_periodically(start, end, interval, func):
    event_time = start
    while event_time < end:
        s.enterabs(event_time, 0, func, ())
        event_time += interval + random.randrange(-5, 45)
    s.run()


###### Code to get the data required from the URL desired
def getData():  
    post_url = "URL OF INTEREST"
    browser = mechanize.Browser()
    browser.set_handle_robots(False)
    browser.addheaders = [('User-agent', 'Firefox')]

######These are the parameters you've got from checking with the aforementioned tools
    parameters = {'page' : '1',
              'rp' : '250',
              'sortname' : 'roi',
              'sortorder' : 'desc'
             }
#####Encode the parameters
    data = urllib.urlencode(parameters)
    trans_array = browser.open(post_url,data).read().decode('UTF-8')

    xmlload1 = json.loads(trans_array)
    pattern1 = re.compile('>&nbsp;&nbsp;(.*)<')
    pattern2 = re.compile('/control/profile/view/(.*)\' title=')
    pattern3 = re.compile('<span style=\'font-size:12px;\'>(.*)<\/span>')


#########################################################################
##### The request sent from here all the way down including comments#####
#########################################################################


##### Making the code identify each row, removing the need to numerically quantify the     number of rows in the xmlfile,
##### thus making number of rows dynamic (change as the list grows, required for looping function to work un interupted)

    for row in xmlload1['rows']:
        cell = row["cell"]

##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex

        user_delimiter = cell['username']
        selection_delimiter = cell['race_horse']


        if strikeratecalc2 < 12 : continue;

##### REMAINDER OF THE REGEX DELMITATIONS
        username_delimiter_results = re.findall(pattern1, user_delimiter)[0]
        userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
        user_selection = re.findall(pattern3, selection_delimiter)[0]



##### Printing the results of the code at hand

        print "user id = ",userid_delimiter_results
        print "username = ",username_delimiter_results
        print "user selection = ",user_selection
        print ""





    getData()


    run_periodically(time()+5, time()+1000000, 3000, getData)

Please be nice with comments, I have been coding for a cumulative 11 days now, so also excuse any major errors in the code I am using, although it is working so far.

Kind regards

AEA

knutole · Accepted Answer · 2013-06-06T00:10:16.260

5

I guess you could simply store the unique id's somewhere (like a file or db - Redis is my absolute favorite) and then check them.

For storing with Redis, you could do something like this:

# redis
import redis
pwd = 'l33t'
r = redis.StrictRedis(host='localhost', port=6379, db=1, password=pwd)  

# set id's
r.sadd('user_ids', unique_id) # this is a set, with no duplicates

# check for existing id's
r.sismember('user_ids', unique_id) # returns 1 or 0

See http://redis.io/commands#set and https://github.com/andymccurdy/redis-py. You need both Redis and redis-py, takes two minutes to install.

edited Jun 06 '13 at 00:10

answered Jun 06 '13 at 00:05

knutole

1,709
2
22
41

1

I have another project coming up which will involve the use of a database so if i start using one i want it to be the easiest one tbh. According to this thread mongodb is easier to code? http://stackoverflow.com/questions/5400163/when-to-redis-when-to-mongodb (I literally started coding 11 days ago, started from "hello world"!) What do you reckon reddis or mongo? – AEA Jun 06 '13 at 00:37
1

I've been through both of them actually - and I recommend Redis by a long shot. It's insanely fast and very easy to use. I use it for everything now, and I'm a relative n00b. – knutole Jun 06 '13 at 01:15

Stopping a looping script from returning duplicate entries

1 Answers1