0

I want to split an array into two array if object has 'confirmation' param. Are there any ways faster way than I used simple for loop. The array has a lot of elements. I have concern about performance.

Before

[
    {
      'id':'1'
    },
    {
      'id':'2'
    },
    {
      'id':'3',
      'confirmation':'20',
    },
    {
      'id':'4',
      'confirmation':'10',
    }
]

After

[{'id': 3, 'confirmation': 20}, {'id': 4, 'confirmation': 10}]

[{'id': 1}, {'id': 2}]

Implementation using for loop

$ python3
Python 3.4.3 (default, Nov 17 2016, 01:08:31) 

dict1 = {"id":1}
dict2 = {"id":2}
dict3 = {"id":3, "confirmation":20}
dict4 = {"id":4, "confirmation":10}
list = [dict1, dict2, dict3, dict4]

list_with_confirmation = []
list_without_confirmation = []
for d in list:
  if 'confirmation' in d:
    list_with_confirmation.append(d)
  else:
    list_without_confirmation.append(d)

print(list_with_confirmation)
print(list_without_confirmation)

Update 1

This is the result on our real data. (3) is the fastest.

(1) 0.148394346

(2) 0.105772018

(3) 0.0339076519

_list = search()

logger.warning(time.time()) //1504691716.5748231

list_with_confirmation = []
list_without_confirmation = []
for d in _list:
  if 'confirmation' in d:
    list_with_confirmation.append(d)
  else:
    list_without_confirmation.append(d)

logger.warning(len(list_with_confirmation)) // 69427
logger.warning(time.time()) // 1504691716.7232175 (0.148394346) --- (1)

list_with_confirmation = [d for d in _list if 'confirmation' in d]
list_without_confirmation = [d for d in _list if not 'confirmation' in d]

logger.warning(len(list_with_confirmation)) // 69427
logger.warning(time.time()) // 1504691716.8289895 (0.105772018) --- (2)

lists = ([], [])
[lists['confirmation' in d].append(d) for d in _list]

logger.warning(len(lists[1])) // 69427
logger.warning(time.time()) // 1504691716.8628972 (0.0339076519) --- (3)

I could not know how to use timeit on my environment. sorry it is poor bench check..

zono
  • 8,366
  • 21
  • 75
  • 113
  • 1
    If you're looking for performance, there's probably not much better than a loop. It's cheap, it's efficient, and there won't be unwanted temporary copies. – cs95 Sep 06 '17 at 08:25
  • You could do it with two list comprehensions if you think that's clearer (like `list_with_confirmation = [d for d in list if 'confirmation' in d]` and `list_without_confirmation = [d for d in list if 'confirmation' not in d]`), although obviously that would incur in two iterations over `list` instead of one (which, to be honest, unless it is a really big list may make no significant difference). – jdehesa Sep 06 '17 at 08:33

3 Answers3

2

List comprehension might be slightly faster:

list_with_confirmation = [d for d in list if "confirmation" in d]
list_without_confirmation = [d for d in list if "confirmation" not in d]

Refer to Why is list comprehension so faster?

chngzm
  • 608
  • 4
  • 13
  • 3
    `[d for d in list if "confirmation" in d]` is enough. `.keys()` unnecessarily creates a list, which is way less efficient to search through. – Chinmay Kanchi Sep 06 '17 at 08:35
1

Probably it is the fastest way, but you could try another:

lists = ([], [])
for d in source_list: 
    lists['confirmation' in d].append(d)

or even:

lists = ([], [])
[lists['confirmation' in d].append(d) for d in source_list]

This way lists[0] will be "without confirmation" and lists[1] will be "with confirmation". Do your own benchmarks.

Side note: don't use list for list name, as it overwrites list constructor function.

Błotosmętek
  • 12,717
  • 19
  • 29
1

If you execute below code:

dict1 = {"id":1}
dict2 = {"id":2}
dict3 = {"id":3, "confirmation":20}
dict4 = {"id":4, "confirmation":10}
_list = [dict1, dict2, dict3, dict4]

import timeit
def fun(_list):
        list_with_confirmation = []
        list_without_confirmation = []
        for d in _list:
          if 'confirmation' in d:
            list_with_confirmation.append(d)
          else:
            list_without_confirmation.append(d)

        print(list_with_confirmation)
        print(list_without_confirmation)


def my_fun(_list):
        list_with_confirmation = [d for d in _list if 'confirmation' in d]
        list_without_confirmation = [d for d in _list if not 'confirmation' in d]
        print(list_with_confirmation)
        print(list_without_confirmation)


if __name__ == '__main__':
    print(timeit.timeit("fun(_list)", setup="from __main__ import fun, _list",number=1))
    print(timeit.timeit("my_fun(_list)", setup="from __main__ import my_fun, _list",number=1))

You can get following statistics:

[{'confirmation': 20, 'id': 3}, {'confirmation': 10, 'id': 4}]
[{'id': 1}, {'id': 2}]
5.41210174561e-05
[{'confirmation': 20, 'id': 3}, {'confirmation': 10, 'id': 4}]
[{'id': 1}, {'id': 2}]
2.40802764893e-05

Which mean List comprehention is most optimize way for more reference you can see:blog

Nirmi
  • 356
  • 3
  • 11