4

I have three arrays as listed below:

  1. users — Contains the id of 50000 users ( all distinct )
  2. pusers — Contains the id of users who own some posts (contains repeated id's also, that is, one user can own many posts) [ 50000 values]
  3. score — Contains the score corresponding to each value in pusers.[ 50000 values]

Now I want to populate another array PScore based on the following calculation. For each value of users in pusers, I need to fetch the corresponding score and add it to the PScore array in the index corresponding to the user.

Example,

if users[5] = 23224
and pusers[6] = pusers[97] = 23224 
then PScore[5] += score[6]+score[97]

Items of note:

  • score is related to pusers (e.g., pusers[5] has score[5])
  • PScore is expected to be related to users (e.g., cumulative score of users[5] is Pscore[5])
  • The ultimate aim is to assign a cumulative score of posts to the user who owns it.
  • The users who don't own any posts are assigned a score of 0.

Can anyone help me in doing this? I tried a lot but once I run my different trials, the output screen remains blank until I Ctrl+Z and get out.

I went through all of the following posts but I couldn't use them effectively for my scenario.

I am new to this forum and I'm a beginner in Python too. Any help is going to be really useful to me.

Additional Information

  • I'm working on a small project using StackOverflow data.
  • I'm using Orange tool and I'm in the process of learning the tool and python.

Ok I understand that something is wrong with my approach. So shouldn't I use lists for this scenario? Can anyone please tell me how I should proceed with this?

Sample of the data that i have arrived at is as shown below.

PUsers  Score
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
13  0
77  1
77  4
77  3
77  0
77  2
77  2
77  3
102     2
105     0
108     2
108     2
117     2

Users
-1
1
2
3
4
5
8
9
10
11
13
16
17
19
20
22
23
24
25
26
27
29
30
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
48
49
50

All that I want is the total score associated with each user. Once again, the pusers list contains repetition while users list contains unique values. I need the total score associated with each user stored in such a way that, if I say PScore[6], it should refer to the total score associated with User[6].

Hope I answered the queries.

Thanks in advance.

Community
  • 1
  • 1
Anu145
  • 115
  • 1
  • 10
  • Is this homework? Do you have efficiency constraints? What exactly have you tried? – ford prefect Nov 04 '13 at 14:07
  • 2
    post your code so we can help – Tim Tisdall Nov 04 '13 at 14:07
  • 4
    Note: you are most likely talking about *lists*; arrays are a different datatype in Python, located in the dedecated `array` module, which can only hold homogeneous primitive values. – Martijn Pieters Nov 04 '13 at 14:12
  • 3
    @Anu145 it would help us if you could add some sample data to your post. Obviously you can't paste in data for 50000 users but you could make up a small amount of data (`users`, `pusers`, `score`) that represents, say, 10 users. If you could add the data as actual Python code (e.g. `users = [123, 123, 456]`) so that it has the same form as your real data that would be awesome. – YXD Nov 04 '13 at 14:39
  • 2
    Given the numbers (50K items per list) I assume this is not homework... But then using lists seems really really wrong, you should have some database to take care of your data. – bruno desthuilliers Nov 04 '13 at 14:41
  • 1
    These kinds of data are perfectly suited to a relational database. Finding the information you need from a rdb would be trivial. Take a look at [sqlite](http://docs.python.org/2/library/sqlite3.html) – ratatoskr Nov 04 '13 at 14:57
  • 1
    BTW, Ctrl-Z doesn't close a program but put it in the background. If you want to kill the program you should use Ctrl-C. – Tim Tisdall Nov 04 '13 at 15:08
  • I have edited the post. Kindly check and help me – Anu145 Nov 04 '13 at 16:08
  • By the way , these three lists are not the real data.. The raw data is a csv file. @bruno desthuilliers I have used the data from csv file to arrive at these three lists. So do u mean that i should have my temporary results(users, pusers, score and Pscore in this case) stored in database too ? – Anu145 Nov 04 '13 at 16:13

2 Answers2

2

From how you described your arrays and since you're using python, this looks like a perfect candidate for dictionaries.

Instead of having one array for post owner and another array for post score, you should be able to make a dictionary that maps a user id to a score. When you're taking in data, look in the dictionary to see if the user already exists. If so, add the score to the current score. If not, make a new entry. When you've looped through all the data, you should have a dictionary that maps from user id to total score.

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

Dillon Welch
  • 481
  • 4
  • 15
  • this sounds like exactly what i want.. I ll try and get back to you.. Thanks !! – Anu145 Nov 05 '13 at 01:58
  • Thank You sooooo much.. Ur idea worked.. Now my code looks simpler and readable. Less complex and serves the purpose too.. Thank u so much for suggesting dictionaries..!! Thanks – Anu145 Nov 05 '13 at 07:08
1

I think your algorithm is either wrong or broken. Try to compute it's complexity. If it's N^2 or more you are likely using an inefficient algorithm. O(N^2) with 50.000 elements should take a few seconds. O(N^3) will probably take minutes. If you're sure of your approach try running it with some small fake data to figure out if it does the right thing or if you accidentally added some infinite loop.

You can easily get it working in linear time with dictionaries.

Sorin
  • 11,863
  • 22
  • 26
  • ya my code takes minutes to run.. I ll try using dictionaries and get back to you.. Thanks !! – Anu145 Nov 05 '13 at 02:00
  • thanks for making me think in the view of complexity. Now i used dictionaries and i find my code simpler at the same time it takes less time to run.. Thanks !! – Anu145 Nov 05 '13 at 07:09