-1

This is a logical question, no programming syntax help needed, just the big picture stuff:


What I currently have

I wrote a Python script that downloads the "Who's Online" list of players of a game. The list is posted on a website. Python downloads the list, removes all the extra HTML and leaves just a pretty list of users, one per line.

It then saves this list to a file. Every 1 minute it downloads the newest "Who's Online" list from the game website, and it then checks the new list against the old list.

If there is a new user that it didn't see before, it adds it to the list, but doesn't remove anyone. So the list continues to grow for 1hour, then python renames the list to "Users_hour_log_timestamp" and starts fresh.

It also creates a list of users for a Day_log, week_log, month_log etc.

Python code

# IMPORT WEB BROWSER
import mechanize

# IMPORT HTML PARSER
from bs4 import BeautifulSoup, NavigableString, Tag

# FOR TIME CALCS
from datetime import datetime, timedelta, date

# FOR RENAMING FILES
import os

# FOR SLEEP
import time

# FOR 1 MONTH FROM NOW CALCULATOR
from dateutil.relativedelta import relativedelta

# CALC ONE MONTH FROM TODAY
ONE_MONTH_LATER = date.today() + relativedelta(months=+1)

# START WEEK TIMER
START_TIME_WEEK = datetime.now()

# SET VARIABLE TO DETERMINE FIRST TIME RUNNING A DAY DUMP
FIRST_TIME = 'yes'

# TIME DELTAS
HOURS_DAY = timedelta(hours=24)
HOURS_WEEK = timedelta(hours=168)


# NEVER ENDING LOOP THAT SLEEPS 1 MINUTE BETWEEN CYCLES
while True:

    # EVERY MINUTE, UPDATE WHAT TODAY IS
    today = datetime.today()

    # DATE TIME FOR TIMESTAMPING LOGS
    dt = today.strftime('%Y-%m-%d_%H.%M.%S_')

    # THIS CHECKS TODAYS DATE TO BE COMPARED TO THE ONE_MONTH_LATER VAR
    MONTH_TODAY = today.strftime('%Y-%m-%d')

    # GET CURRENT TIME
    CURRENT_TIME = datetime.now()

    # USE CURRENT TIME TO CALCULATE IF 1 WEEK HAS PASSED
    TIME_PASSED_WEEK = CURRENT_TIME - START_TIME_WEEK

    # GET THE CURRENT HOUR-MINUTE, THIS IS USED TO DUMP DAY LOGS AT 11:59 PM (23-59)
    TIME_HOUR = today.strftime('%H-%M')

    # GET CURRENT MINUTE, THIS IS USED TO DUMP HOUR LOGS AT 59 (END OF EACH HOUR)
    TIME_MINUTE = today.strftime('%M')
    print "START of LOOP FIRST TIME = %s" % FIRST_TIME
    # IF IT IS MIN 59 OF ANY HOUR, DUMP THE HOUR LOG
    if TIME_MINUTE == '59':
        newname = dt + 'WAR2RU_USERS_HOUR_LOG.txt'
        os.rename('users-all-hour.txt', newname)
        # CLEAR THE LOG TO START NEW HOUR LOG
        clear = open("users-all-hour.txt", "w")
        clear.close()
        print "HOUR LOG DUMPED"

    # IF ITS THE FIRST TIME RUNNING, DAY LOG HASNT DUMPED, RESET WEEK TIMER AT MIDNIGHT AND DUMP DAY LOG
    if TIME_HOUR == '23-59' and FIRST_TIME == 'yes':
        print "First time day dump:"
        # RENAME CURRENT DAY LOG TO TIME STAMPED DAY LOG
        newname = dt + 'WAR2RU_USERS_DAY_LOG.txt'
        print "newname = %s" % newname
        os.rename('users-all-day.txt', newname)
        # CLEAR THE LOG TO START NEW DAY
        clear = open("users-all-day.txt", "w")
        clear.close()
        print "DAY LOG DUMPED"
        # RESET VAR
        FIRST_TIME = 'no'
        # RESET THE WEEK LOG
        clear = open("users-all-week.txt", "w")
        clear.close
        # RESET THE DAY LOG
        clear = open("users-all-day.txt", "w")
        clear.close()
        # START THE WEEK TIMER SO IT BEGINS A NEW WEEK
        START_TIME_WEEK = datetime.now()

    # IF ITS 12am, NOT FIRST TIME RUNNING, AND DAY LOG HASNT BEEN DUMPED YET
    if TIME_HOUR == '23-59'and FIRST_TIME == 'no':
        print "new day, first time is NO"
        newname = dt + 'WAR2RU_USERS_DAY_LOG.txt'
        os.rename('users-all-day.txt', newname)
        # CLEAR THE LOG TO START NEW DAY
        clear = open("users-all-day.txt", "w")
        clear.close()
        print "DAY LOG DUMPED"
        # RESET THE FIRST_TIME VAR
        FIRST_TIME = 'no'
    print "END of LOOP FIRST TIME = %s" % FIRST_TIME

    if TIME_PASSED_WEEK > HOURS_WEEK:
        newname = dt + 'WAR2RU_USERS_WEEK_LOG.txt'
        os.rename('users-all-week.txt', newname)
        # CLEAR THE LOG TO START NEW DAY
        clear = open("users-all-week.txt", "w")
        clear.close()
        print "WEEK LOG DUMPED"
        START_TIME_WEEK = datetime.now()

    if MONTH_TODAY == ONE_MONTH_LATER:
        newname = dt + '_WAR2RU_USERS_MONTH_LOG.txt'
        os.rename('users-all-month.txt', newname)
        # CLEAR THE LOG TO START NEW DAY
        clear = open("users-all-month.txt", "w")
        clear.close()
        print "MONTH LOG DUMPED"
        ONE_MONTH_LATER = date.today() + relativedelta(months=+1)


    br = mechanize.Browser()
    # Browser options
    br.set_handle_equiv(True)
    br.set_handle_redirect(True)
    br.set_handle_referer(True)
    br.set_handle_robots(False)

    # Follows refresh 0 but not hangs on refresh > 0
    br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)

    # Want debugging messages?
    # br.set_debug_http(True)
    # br.set_debug_redirects(True)
    # br.set_debug_responses(True)
    try:
        # User-Agent
        br.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko')]
        r = br.open('http://en.war2.ru/wp-content/themes/war2.ru/lib/server-stats.php')
        page = r.read()
        soup = BeautifulSoup(page)
        user_list = soup.find_all('div', class_='status-list')
    except:
        print "ERROR: Connection failed. Retrying ..."
        continue
    ############### DAY LOG #######################
    # START WITH ALL USERS LIST, ADD CURRENT USERS, THEN DE DUP OVER TO ALL USERS LIST

    with open("users-all-day.txt", "r") as f:
        lines = f.readlines()
        with open("users-this-minute-day.txt", "w") as f1:
            f1.writelines(lines)
            f1.close()
        f.close()

    with open("users-all-week.txt", "r") as f:
        lines = f.readlines()
        with open("users-this-minute-week.txt", "w") as f1:
            f1.writelines(lines)
            f1.close()
        f.close()

    with open("users-all-month.txt", "r") as f:
        lines = f.readlines()
        with open("users-this-minute-month.txt", "w") as f1:
            f1.writelines(lines)
            f1.close()
        f.close()

    with open("users-all.txt", "r") as f:
        lines = f.readlines()
        with open("users-this-minute-all.txt", "w") as f1:
            f1.writelines(lines)
            f1.close()
    f.close()

    with open("users-all-hour.txt", "r") as f:
        lines = f.readlines()
        with open("users-this-minute-hour.txt", "w") as f1:
            f1.writelines(lines)
            f1.close()
    f.close()

    # GET USERS FROM RU SITE
    for br in user_list:
        users = br.findAll('br')

        for br in users:
            next = br.nextSibling
            if not (next and isinstance(next,NavigableString)):
                continue
            next2 = next.nextSibling
            if next2 and isinstance(next2,Tag) and next2.name == 'br':
                text = str(next.encode('latin1')).strip()
                if text:
                    output = next[1:].encode('latin1')
                    # ADD A USER TO users-this-minute-day.txt
                    file = open("users-this-minute-day.txt", "at")
                    print >>file, str(output)
                    file.close

                    # ADD A USER TO users-this-minute-day.txt
                    file = open("users-this-minute-week.txt", "at")
                    print >>file, str(output)
                    file.close

                    # ADD A USER TO users-this-minute-day.txt
                    file = open("users-this-minute-month.txt", "at")
                    print >>file, str(output)
                    file.close

                    # ADD A USER TO users-this-minute-day.txt
                    file = open("users-this-minute-all.txt", "at")
                    print >>file, str(output)
                    file.close

                    # ADD A USER TO users-this-minute-hour.txt
                    file = open("users-this-minute-hour.txt", "at")
                    print >>file, str(output)
                    file.close

        lines_seen = set() # holds lines already seen
        outfile = open("users-all-day.txt", "w")
        for line in open("users-this-minute-day.txt", "r"):
            if line not in lines_seen: # not a duplicate
                lines_seen.add(line)
        outfile.writelines(sorted(lines_seen))
        outfile.close() 


        lines_seen = set() # holds lines already seen
        outfile = open("users-all-week.txt", "w")
        for line in open("users-this-minute-week.txt", "r"):
            if line not in lines_seen: # not a duplicate
                lines_seen.add(line)
        outfile.writelines(sorted(lines_seen))
        outfile.close() 

        lines_seen = set() # holds lines already seen
        outfile = open("users-all-month.txt", "w")
        for line in open("users-this-minute-month.txt", "r"):
            if line not in lines_seen: # not a duplicate
                lines_seen.add(line)
        outfile.writelines(sorted(lines_seen))
        outfile.close() 

        lines_seen = set() # holds lines already seen
        outfile = open("users-all.txt", "w")
        for line in open("users-this-minute-all.txt", "r"):
            if line not in lines_seen: # not a duplicate
                lines_seen.add(line)
        outfile.writelines(sorted(lines_seen))
        outfile.close() 

        lines_seen = set() # holds lines already seen
        outfile = open("users-all-hour.txt", "w")
        for line in open("users-this-minute-hour.txt", "r"):
            if line not in lines_seen: # not a duplicate
                lines_seen.add(line)
        outfile.writelines(sorted(lines_seen))
        outfile.close() 

        print "Sleeping for 1 minute ..."
        time.sleep(60)

Now this is all nice for manually searching through and creating a big list of game usernames, but now I want it to be more sophisticated.


Desired Functionality

These are the kind of questions I need a database to be able to answer:

  • BE ABLE TO ANSWER A QUESTION LIKE "Was Player X online from 5pm-8pm on Nov 22 2015?"

  • BE ABLE TO DRAW A CHART OF THE TIME OF DAY/DAY OF WEEK WITH MOST PLAYERS ONLINE

  • BE ABLE TO HAVE A CALENDAR FOR EACH PLAYER SHOWING WHAT DAYS/HOURS THAT MONTH THAT THEY PLAYED

  • BE ABLE TO HAVE A LEADERBOARD OF PLAYERS ONLINE THE MOST

In order to do this I know I need to start with a database and stop using text files.

That is where the problem is for me. I cannot get my head around what a database for this would look like.

It would obviously have a column for each user, but then how would I keep track of what minutes, hours, and days that user was online?

I have completely rewritten this because I am trying to convey this idea so that others can understand. Please let me know what you think

Thanks!

Archer
  • 43
  • 8
  • 1
    Your description of the problem is extremely unclear. – developerwjk Nov 23 '15 at 22:06
  • Are you interfacing to a database or just reading textfiles? Additionally, please provide a sample input and output. – albert Nov 23 '15 at 22:06
  • whats unclear about it – Archer Nov 23 '15 at 22:06
  • "whats unclear about it" ... everything – developerwjk Nov 23 '15 at 22:07
  • @albert the python script grabs the users online and dumps them into text files, renaming for users_hourlog and users_daylog etc. I want to get this all to be database driven however, but I cant figure out what the database would look like – Archer Nov 23 '15 at 22:07
  • I think you question is going to be downvoted/closed as it is too broad. Show the data you have as input and the transformation you have applied already, together with the desired functionality you'd like your program to offer. Please read [this](http://stackoverflow.com/help/mcve) – Pynchia Nov 23 '15 at 22:08
  • I have completely rewritten. Please help me out @pynchia ! Thanks – Archer Nov 23 '15 at 22:20
  • @albert Rewrote everything! – Archer Nov 23 '15 at 22:23
  • 1
    use textfile... screw databases. – Flash Thunder Nov 23 '15 at 22:26
  • Impossible to use text files for this much info I would think - its not flexible like a database, I wouldnt be able to answer questions like "Was Player X online between 5pm and 8pm on Nov 22?" .... So the problem remains for me - what would the structure look like for a database that holds the answer to a question like this – Archer Nov 23 '15 at 22:30
  • please show sample lines from the files – Pynchia Nov 23 '15 at 22:36
  • Like I said, Its not a code problem its a logic problem. What would the database look like - how would it be able to record who was online at what time and date - what would the columns be? What would be the point in posting my Python code that records the "Who's Online" lists to text files – Archer Nov 23 '15 at 22:38
  • @pynchia posted python code, but like I said I think you're missing the point because the python code has nothing to do with the question. – Archer Nov 23 '15 at 22:45
  • I agree and I think we're all fully aware it's an architectural problem. Still, knowing what the input is helps. – Pynchia Nov 23 '15 at 22:48
  • Guys, the ONLY thing my brain can come up with - does each day have to be its own table with 24 hours inside of it, and a username that marks each hour as 1 or 0. ? It seems so unscalable and heavy. There has to be a better way – Archer Nov 23 '15 at 23:07

2 Answers2

0

Preliminary Answer, I have to look into this more

Database structure for holding statistics by day, week, month, year

Basically the answer is to just put into the database timestamps everytime someone is online like

So say its just me and someguy online right now ->

user_name, event, timestamp
archer, online, 11-23-2015 5:04pm
someguy, online, 11-23-2015 5:04pm
archer, online, 11-23-2015 5:05pm
someguy, online, 11-23-2015 5:05pm
archer, online, 11-23-2015 5:06pm
someguy, online, 11-23-2015 5:06pm

Now I can make a chart of when I was online somehow by searching through all my timestamps of my user_name

Community
  • 1
  • 1
Archer
  • 43
  • 8
0

In contrast to your suggestion of having a timestamp for every minute that user is online, this is what I thought:
For every minute your code checks the new list against the old list. So when a new user is added to the list add a new record (or row) to the database with the username, the logon timestamp, and when they go offline (removed from the list) add a logoff timestamp. Alternatively have the record entered after they logoff. i.e. Record when the user first appeared on the list, when they stopped appearing on the list, and then adding their username and the two timestamps to the database.

This way you can search for a user and it will tell you what sessions the user had online.

What database platform will you be using? (I don't know much about python but you said this is a logic question.) If you used MySQL I would be able to help further.

EDIT: The database would have these columns

  • ID (this is for unique record identification as the username will be entered multiple times)
  • Username (there will be multiple records for each user, each will be a session)
  • Session start time (login)
  • Session end time (logout)
Locke Donohoe
  • 482
  • 1
  • 7
  • 22
  • Thanks Locke this is the best answer so far. I am using MySQL, my only confusion is - how would I update that specific row to mark the user as logged out? Because there would be multiple rows with the same username - 1 for each 'login/logout' session. I know I would use the ID to identify which row to logout, but how would I keep track of which ID to logout? Thanks alot for your help!!!! – Archer Nov 24 '15 at 14:14
  • If every time they logout you update the row, the logout column for the last session should have data in it. Sort the table by rows with that Username in which the logout column is empty. There should only be one for each user max as there should/would only be one active session. Therefor by searching like this you will only receive one result back and can use the ID grabbed from that to input back into that record. – Locke Donohoe Nov 27 '15 at 00:34