-3

I am reading data from the text file which is in the following format.

1 1 5
1 3 3
1 5 4
2 1 5
2 4 3
3 1 2
3 3 4
3 4 3 
3 5 4

The first column represents the coachId, the second column represents the playerId, and the last column represents the score given by each coach to each player. So now say there are 3 coaches and 5 players and the data we are given is not complete. We basically have to implement a recommender system and generate the missing scores for each player by each of the coaches. I have already done that part. So basically now I want to generate an output file to fill out the missing scores. Here is my logic.

data = np.loadtxt('player.txt')
coaches = data.T[0]
players = data.T[1]
scores = data.T[2]


a = 0
total = 3 * 5 #total fields to fill is num of player times num of coaches

while a < total:
   b = 0
   while b < 3:  #for each coach
   #check if score was given
   # if score is given don't do anything
   # if score is not given get new socre and write it to file

I feel like this approach might take a LONG time if i lots of coaches and players. IS there a better way to do this?

user1010101
  • 2,062
  • 7
  • 47
  • 76
  • I suspect you'd be better served by running a "GetAllMissingScores" type function first, then filling in those scores separately, than a multi-responsibility 1-by-1 loop. How you set that up would likely be preference (there may be an optimized solution, but I'm not a python guy) – Sitric Dec 11 '15 at 19:04
  • http://stackoverflow.com/questions/18689235/numpy-array-replace-nan-values-with-average-of-columns – dot.Py Dec 11 '15 at 19:19
  • @Pardoido This helps how? – BlackJack Dec 11 '15 at 21:49

1 Answers1

0

You separating values that belong together into three separate lists. This just makes it harder to access them. Also if you want to extend the file you don't need the score values already in it but just the information which coach and player combinations are already in there. This can be stored in a set for efficient tests if a combination is already in there.

The outer loop seems to be running until a hits the total number of records? The inner loop is executed for each record and for each coach, so three times the total number of records for three coaches. That doesn't make much sense.

Here is an approach that needs a get_score_somehow() filled in:

#!/usr/bin/env python
# coding: utf8
from __future__ import absolute_import, division, print_function
from itertools import product


def main():
    filename = 'test.txt'
    coach_count = 3
    player_count = 5

    already_scored = set()
    with open(filename) as lines:
        for line in lines:
            coach_id, player_id, _ = map(int, line.split())
            already_scored.add((coach_id, player_id))

    with open(filename, 'w') as score_file:
        for coach_id, player_id in product(
            xrange(coach_count), xrange(player_count)
        ):
            if (coach_id, player_id) not in already_scored:
                score = get_score_somehow(coach_id, player_id)
                record = [coach_id, player_id, score]
                score_file.write(' '.join(map(str, record)) + '\n')


if __name__ == '__main__':
    main()
BlackJack
  • 4,476
  • 1
  • 20
  • 25