1

I have a list get from database.

[{
  'name': 'John',
  'score': 30
}, {
  'name': 'Jan',
  'score': 23
}, {
  'name': 'Mike',
  'score': 34
}]

Can numpy get the sum of the score? (without loop through 1 by 1 using for in)

Razik
  • 182
  • 5
  • 16
Js Lim
  • 3,625
  • 6
  • 42
  • 80
  • Is not duplicate, my case is list of dictionary, I just want to add the `score`, and I did mention without loop through 1 by 1. like `numpy.sum(list)` – Js Lim Aug 15 '16 at 04:34
  • ohh..ok got it. retracted!! May i suggest you to update the heading to mention that as well. kinda a misleading. the without loop part is only at the end. – Iceman Aug 15 '16 at 04:38
  • 1
    The title is misleading. This isn't a column; it's a dictionary value. – hpaulj Aug 15 '16 at 05:50

4 Answers4

5

You can do this by performing a sum on a list comprehension that collects all the "scores":

sum( [x['score'] for x in MyListOfDictionaries] )

(PS. Numpy is not necessary here)


Edit: as pointed out by @sebastian in the comments, the brackets around the list comprehension aren't necessary since we're plugging this directly into a function, i.e.:

sum(x['score'] for x in MyListOfDictionaries)

this is known as "generator" syntax; from a performance point of view it can be more efficient as it avoids the extra step of allocating memory for the list before processing it.

Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57
  • 2
    There's no need for the outer `[]` brackets though - you can call `sum` on a generator as well, avoiding the creation of a temporary list containing all the scores. – sebastian Aug 15 '16 at 07:20
  • thanks, good catch. I'm used to julia where you can't omit them so I keep doing this :p I'll update with your comment. :) – Tasos Papastylianou Aug 15 '16 at 11:47
3
In [1963]: ll=[{
  ...:   'name': 'John',
  ...:   'score': 30
  ...: }, {
  ...
  ...: }]

First the obvious iterative solution

In [1965]: sum([d['score'] for d in ll])
Out[1965]: 87

I can turn it into an object array with:

In [1966]: np.array(ll)
Out[1966]: 
array([{'score': 30, 'name': 'John'}, {'score': 23, 'name': 'Jan'},
       {'score': 34, 'name': 'Mike'}], dtype=object)

but applying sum directly to that won't help. But:

In [1967]: from operator import itemgetter
In [1970]: np.frompyfunc(itemgetter('score'),1,1)(ll).sum()
Out[1970]: 87

See my recent answer https://stackoverflow.com/a/38936480/901925 for more on how to access attributes of objects in an array.

frompyfunc doesn't really get rid of iteration - it just wraps it in a user friendly manner. And the itemgetter is still doing item['score'] for each dictionary in the list.

This use of itemgetter is basically the same as:

In [1974]: list(map(itemgetter('score'), ll))
Out[1974]: [30, 23, 34]

List comprehension, map, frompyfunc are all ways of iterating through the list and getting the score value from each dictionary.

pandas may be able to turn this whole list into a dataframe, but don't be fooled by its ease of use - it's doing all of this, and more, under the covers.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353
1

NumPy is a library for numerical arrays processing. You use numbers as columns names, convert your collection to matrix and use NumPy to make your calculations if you want to use exactly NumPy and its performance.

I suggest you to try pandas library: it has a type DataFrame which was created to contain and process collections like yours (like dataframes in R language or tables in MatLab) — tables with columns and rows. It has sum method which solves your problem.

I guess, it's not the only thing that you want to do with your data and speed is important — I'd recommend to use this library.

Here are related StackOverflow questions, which will show you some abilities of the library:

Community
  • 1
  • 1
Charlie
  • 826
  • 1
  • 11
  • 27
1
lst = [{
  'name': 'John',
  'score': 30
}, {
  'name': 'Jan',
  'score': 23
}, {
  'name': 'Mike',
  'score': 34
}]

sum(map(lambda x: x['score'], lst))
xblymmx
  • 39
  • 2
  • 3