19

I am given a list l and I want to do assignment:

l[index] = val

But there might be a case when the list is too small.

So, I want to ensure I have space for the new value. Sometimes I need to fill the new space with empty strings '', and sometimes with other objects (like empty lists [], False or None).

For this task I use the following procedure:

def ResizeList(l, size, fill_with=None):
    l += [fill_with]*(size-len(l))

(note: it works even if size-len(l)<=0) (note: as I am interested in reserving space, I intentionally DO NOT truncate it to a shorter list)

Like that:

ResizeList(l, index+1)
l[index] = val

(When filling with other object, it's like : ResizeList(l, index+1, []) )

Are there more pythonic ways of doing that? Are there some built-ins or library functions for doing this?

I am using mostly Python-3.x, but know-how about Python-2x is useful and welcome.

Clarification: Please, do not tell me about dict , cause I need list

For those who would like me to be more specific:

The problem statement states it's about list type. Using dict here is not an option or solution. There are reasons for that, specifically related to the domain (I am doing a prototype of an experiment that has to show some asymptotic behaviour, not - as probably you're used to - a prototype of a program. If it would be "just a prototype of a program", then I agree with using a dict and the other comments). I have the following assumptions:

  • I have many many lists (need to care about memory and performance overhead)
  • due to workflow and need of prototype, I cannot call a handcoded C/C++ extension
  • during computation the final list size is unknown
  • we know that in the and the lists will be dense
  • list cells are written and overwritten in an unknown order

Those are just a few reasons why I have stressed that I need a list and not a dict. For those interested in more details or who would like to discuss about dict, checkout how we discuss in comments HERE

Community
  • 1
  • 1
Grzegorz Wierzowiecki
  • 10,545
  • 9
  • 50
  • 88
  • Why do you need to assign to an index that may not exist? – Ignacio Vazquez-Abrams Jan 13 '12 at 11:28
  • 3
    Why not? Matlab/Octave use this convention for matrix assignment and I believe that it's not the only use case possible. – Kos Jan 13 '12 at 11:41
  • @IgnacioVazquez-Abrams My lists are **dense**, quite short and their **elements are created not in order**. Finally, in each list only few cells are missing. To be more specific: algorithm is preparing results in lists, which are going to be rows for `csvwriter.writerow()`, but cells are not computed in order, let's say "random order". So when filling n'th cell, there is no problem with creation of n-1 "empty cells", cause they will be mostly filled in later computation. (Btw. @Kos , good intuition with Matlab/Octave. Here are computations as well) – Grzegorz Wierzowiecki Jan 13 '12 at 11:47
  • Are you __sure__ that using a dict instead of a list will have a significant impact on performance? This sounds like premature optimization to me. – James Jan 13 '12 at 13:41
  • 2
    @James It's question of design, not "premature optimisation". For more info, check-out [Big O notation](http://en.wikipedia.org/wiki/Big_O_notation) and differences in asymptotic complexity of `list` and `dict` in [Python's Wiki - TimeComplexity page](http://wiki.python.org/moin/TimeComplexity) with asymptotic complexities in BigO notation. (As I've wrote : checkout more details in discussion with Rik P.) – Grzegorz Wierzowiecki Jan 13 '12 at 13:57
  • 1
    @GrzegorzWierzowiecki: The existing differences does not support your case that using a dict would have an adverse effect. Python dicts are *very* efficient. You are mostly getting and setting, which in both cases are O(1). – Lennart Regebro Jan 13 '12 at 14:22
  • "Using dict here is not an option or solution." False. "many many of lists" works with dictionaries, also. "during computation final list size is unknown" that means "dictionary". "lists will be dense". Doesn't matter. "cells are written and overwritten in unknown order". Doesn't matter. Please consider a dict, since it does everything you want. – S.Lott Jan 13 '12 at 20:19
  • @S.Lott If you would like to argue, please [checkout how we discuss in comments HERE](http://stackoverflow.com/a/8849876/544721) as it's written in problem statement. – Grzegorz Wierzowiecki Jan 14 '12 at 11:39
  • 3
    @LennartRegebro you are telling me about "Average Case". As I am proving asymptotic behaviour in *worst case*, please check out that *"Amortized Worst Case"* costs for dict all are O(n), cause this is relevant to my domain. – Grzegorz Wierzowiecki Jan 14 '12 at 11:45
  • @GrzegorzWierzowiecki: Worst case is not usually triggered by asymptotic behaviour, but by specific sequences. In a dict worst case is if all your keys have the same hash, for example. That's not asymptotic. – Lennart Regebro Jan 14 '12 at 11:54
  • 1
    @LennartRegebro There are asymptotic analysis of worst case. But you are right, I could check out propability of hash collisions. But as memory overhead is another issue, that's why I decided python lists as simply better for my case. – Grzegorz Wierzowiecki Jan 14 '12 at 12:00
  • 1
    @GrzegorzWierzowiecki: Since you're engaged in premature optimization and ignoring well-considered advice from others, I would suggest that we're not "arguing". We're trying to provide data you can act on. We're trying to solve the problem you appear to have. Your best choice is to (1) avoid arguing, (2) take the answers you get and (3) plot the best course you can through the information you get. Trying to convince us we're *wrong* is doomed. You're going to get multiple points of view on this. Please accept that in the spirit it's offered. – S.Lott Jan 14 '12 at 13:55
  • 1
    Thanks @S.Lott for advices. Actually I could agree with most of statements, but provocative drawing specific situation is also educative. I've learned a lot from all comments and believe other readers will, as well. +1 for sake of discussion. Btw. I wonder what's better in such situation. I've followed Rik's advice about stressing out that I need lists. On the other hand, I see it makes other kind of comments. How do you think. What should I write if I am curious about `list`s approach? `dict` was obvious to me, so I asked about lists. If I would like to used `dict` I would not ask at all. – Grzegorz Wierzowiecki Jan 14 '12 at 15:48
  • 1
    You can ask about lists, but it's list asking about making a silk purse from a pig's ear. You can ask, but it is not sensible to try. You can ask and demand that people answer; and when they answer that you should not try you can continue to demand bad answers. Most folks don't want to give bad answers. So they suggest that lists are a bad idea. You're free to ask, however. Please be polite, however, since we're trying to explain that lists really are a bad idea. No matter how much you demand we think they're a good idea. – S.Lott Jan 16 '12 at 02:58
  • Thanks @S.Lott for very nice explanation. I'd love to make SO place with better answers, as you and other folks. What do you think about changing "Clarification: Please, do not tell me about dict " whole paragraph into sth alike "Even in most most cases 'dict' are better idea, I'd like to try lists. Others - to this on your own risk" - or sth like that? Maybe better idea? – Grzegorz Wierzowiecki Jan 16 '12 at 16:00
  • _"...trying to explain that lists really are a bad idea..."_ It's either lists or trolls, one of those is definitely a bad idea. – Evgeni Sergeev Jul 05 '15 at 03:54

4 Answers4

8

If you're sure that a list -- and not, say, a dict -- is the best data structure for your use case, I propose the following class:

class rlist(list):
  def __init__(self, default):
    self._default = default
  def __setitem__(self, key, value):
    if key >= len(self):
      self += [self._default] * (key - len(self) + 1)
    super(rlist, self).__setitem__(key, value)

l = rlist(0)
print(l)
l[10] = 20
print(l)
l[5] = 14
print(l)

This class checks whether the index being assigned to is beyond the current length of the list, and automatically expands the list as required.

The code is compatible with both Python 2 and 3 (tested with 2.6.5 and 3.1.2).

This class could be handy if the structure is dense and you need to find the element by index as quickly as possible. If the structure is sparse, you should probably consider using a dictionary.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Nice approach - it "hides" problem, and it's ok to me. – Grzegorz Wierzowiecki Jan 13 '12 at 11:37
  • I voted down for this approach because it is WRONG to subclass built-in types in Python, no matter what the purpose is. – Zaur Nasibov Jan 13 '12 at 11:41
  • 1
    @BasicWolf: Care to elaborate? You're saying that all the work that went into PEP 253 + implementation was a bad idea? http://www.python.org/dev/peps/pep-0253/ – NPE Jan 13 '12 at 11:43
  • @BasicWolf - I am not familiar (yet) with whole python style guid. Could you be more specific, what wrong in such approach? – Grzegorz Wierzowiecki Jan 13 '12 at 11:44
  • 6
    @BasicWolf why is subclassing built-in types wrong? It's very explicitly allowed for new-style classes e.g., http://python-history.blogspot.com/2010/06/new-style-classes.html – Andrew Jaffe Jan 13 '12 at 11:45
  • 6
    @BasicWolf if everyone listened to that wisdom, we may never have gotten `DefaultDict`. – yurisich Jan 13 '12 at 12:14
  • 1
    @aix It would be even nicer if you did: 'def __init__(iterable=(), default=None)` to make the prototype a superset of the normal list type. – Duncan Jan 13 '12 at 12:25
  • I agree with that above code can be tuned-up and encourage you to do that :). – Grzegorz Wierzowiecki Jan 13 '12 at 12:28
  • 2
    @BasicWolf So `collections.OrderedDict`, `collections.Counter`, `collections.namedtuple` are all WRONG are they, or does the standard library have a unique right to subclass builtin types? (unlike `defaultdict` which is C coded the ones I listed all involve Python subclasses of builtins). – Duncan Jan 13 '12 at 12:31
  • I've just tried comparing the performance of `rlist` and `dict` using `timeit`. I populated each with the same set of 5 small objects (including an empty `rlist`/`dict`), in the same arbitrary order. Populating `dict` was about 8 times faster than populating `rlist`. Accessing the elements of `dict` in ascending order (of the index) was about 50% faster than `rlist`. So `dict` seems a lot more efficient. Of course, the results could be different with longer lists. – James Jan 13 '12 at 14:04
  • @aix: This is what I would do as well, except that I'd use try/except instead of testing if index > len() everytime. – Lennart Regebro Jan 13 '12 at 14:11
  • @James: Yes, but if you use a dict you'd have to generate a list from it once you are to use it. That would also take time. – Lennart Regebro Jan 13 '12 at 14:13
  • @LennartRegebro: I don't see why you would have to turn the dict into a list. – James Jan 13 '12 at 14:29
  • @James: No, I misunderstood him and made an assumption that I now realize isn't stated anywhere. Indeed you may not need to do that. – Lennart Regebro Jan 13 '12 at 14:52
  • @Droogan have you ever went through the code of e.g. OrderedDict or any other collection that subclasses a built-in type? Speaking of OrderedDict: almost every method of original dict is overriden. The problem of most programmers that try to subclass built-in types is that they override couple of methods and the rest remains intact. Even this answer lacks e.g. specific `__getitem__`, splicing and negative indices logic. – Zaur Nasibov Jan 13 '12 at 15:20
  • @AndrewJaffe Since a lot of the code is in C, trying to subclass it in python can lead to weird behavior e.g. `list.sort` temporarily removes the underlying array from the python wrapper to work with it in C. If you want to subclass `list`, usually `collections.UserList` is what you want. – Simply Beautiful Art Mar 28 '22 at 18:43
6

I came up with something that uses itertool.repeat().

import itertools

def assign(lst, idx, value, fill=None):
    diff = len(lst) - idx
    if diff >= 0:
        lst[idx] = value
    else:
        lst.extend(itertools.repeat(fill, -diff))
        lst.append(value)

That have the following behaviour:

>>> l = [0, 1, 2, 3, 4]
>>> assign(l, 2, 'new')
>>> l
[0, 1, 'new', 3, 4]
>>> assign(l, 8, 'new')
>>> l
[0, 1, 'new', 3, 4, None, None, None, 'new']
>>> assign(l, 10, 'new', fill=[])
>>> l
[0, 1, 'new', 3, 4, None, None, None, 'new', [], 'new']

Does this work for you?

Edit: Since the question was updated I've updated the answer.

Rob Bednark
  • 25,981
  • 23
  • 80
  • 125
Rik Poggi
  • 28,332
  • 6
  • 65
  • 82
  • Not only "add". If there are present, I want to overwrite. – Grzegorz Wierzowiecki Jan 13 '12 at 11:38
  • I'm not sure why you would to that, If you want to assign at each number a specific value you may want to use a dictionary. Would you provide an example of input and output of what you're trying to do? – Rik Poggi Jan 13 '12 at 11:44
  • For all your curiosity, I've just wrote reasons in comment under main post. :) – Grzegorz Wierzowiecki Jan 13 '12 at 11:49
  • I've read your comment and updated my answer, still I don't fully understand what you're trying to do. – Rik Poggi Jan 13 '12 at 12:07
  • You need to trust in my problem statement. I know my domain and assumptions. Let me be more specific, to clarify a little bit more : many `dict`s would be too big overhead in my case. I have many of them. If I would know what will be final size, I would just make matrix with fixed with. Actually I don't know during computation what will be the final length of each row. I do research in automatic analysis of computation complexity and I don't know final size like we don't know rules about [Colatz conjecture](http://en.wikipedia.org/wiki/Collatz_conjecture). – Grzegorz Wierzowiecki Jan 13 '12 at 12:24
  • 1
    I understand better! Would you please update your question stating that the use of `dict` is not an option and why? That will help in finding an answer. – Rik Poggi Jan 13 '12 at 12:44
  • 1
    Anyway if you have performance issue have you tried building an extension in C? Or is it an option to build a custom object (maybe implemented in C) that will take care of everything for you? – Rik Poggi Jan 13 '12 at 12:44
  • Thanks Rik P. for advice about statement. What about extension in C, I am coding most of my work in C/C++ due to performance issues. But I want sometime to save at least a little bit of time to do prototype in Python or other language :). Propably I will move this part some day into C/C++ but this time I want just to prototype. (Another this is that using C/C++ coded extensions, somehow, kills collaboration with other's, as it's easiest way to send someone pure Python script with message , "Hey, check it out". In such cases, C coded extensions are like obfuscation) – Grzegorz Wierzowiecki Jan 13 '12 at 12:49
  • 1
    Now it fits problem statement :). Thanks. – Grzegorz Wierzowiecki Jan 13 '12 at 14:00
  • 1
    @GrzegorzWierzowiecki: Although the dict is indeed bigger than the list, unless you have a million lists that is unlikely to make a noticeable difference. You *are* prematurely optimizing. :-) – Lennart Regebro Jan 13 '12 at 14:25
  • @LennartRegebro I have hundreds of millions of lists :). The more I can fit - the better result I get. I have 48 GB of RAM in machine. You *are* not thinking [Asymtotically](http://en.wikipedia.org/wiki/Asymptotic_analysis) and not taking into account all performance issues (like *cache*). What I do now, is Scientific Prototyping before reimplementation to C/C++. If you don't believe about behaviour of `list` and `dict` on many accesses , checkout : [Wiki Python - TimeComplexity](http://wiki.python.org/moin/TimeComplexity) – Grzegorz Wierzowiecki Jan 13 '12 at 14:40
  • @LennartRegebro On the other hand, it happens to do premature optimisation and it's important to into stuck into endless optimisations ;) -> "+1" – Grzegorz Wierzowiecki Jan 13 '12 at 14:45
  • 1
    @GrzegorzWierzowiecki: I already checked it out and explained to you that you were wrong about that. If you have hundred of millions of lists, you want to stop having that. On the other hand, if it's just prototyping, you don't really have hundreds of millions of lists. ;-) – Lennart Regebro Jan 13 '12 at 14:50
  • @LennartRegebro Such big amount of lists **is** prototype. Such big amount is need, to show some properties in data.In some cases you can not show properties in *"ten examples"* (check out [Four colour theorem : Proof by Computer](http://en.wikipedia.org/wiki/Four_color_theorem) as example, where there was a need of such big amount of examples,that program assistance was a need). In case of good results,there will be motivation of reimplementation into C/C++ distributed computing for much much bigger data.(Even thou - "+1" for sake of discussion:) I like criticism,it helps in self-improvement) – Grzegorz Wierzowiecki Jan 13 '12 at 15:40
  • 1
    @GrzegorzWierzowiecki: Well you seem to improve in ways of avoiding the criticism. :-) But sure, for a prototype using inefficient data structures are acceptable. But then your argument against the dict also falls. – Lennart Regebro Jan 13 '12 at 17:35
  • 1
    @GrzegorzWierzowiecki: Glad to be of help! Why don't you open another question providing a general overview of what you're doing? Maybe the community can be of some help! – Rik Poggi Jan 13 '12 at 18:49
  • @LennartRegebro Your argument is right, when you make prototype of program. In such case, my arguments would fail-I agree.Here it's about prototype of experiment.I appreciate and understand your argumentation and criticism.I've fallen into premature optimisation many times and I know learning never stops. However, this time I need a few things to show some properties of experiment(**including it's asymptotic behaviour**),but I would need to go into details to explain.To sum it up: 1.This time it's about prototype of experiment, not program 2.I appreciate all arguments and Thank you for them – Grzegorz Wierzowiecki Jan 14 '12 at 11:14
  • @RikPoggi Rik, as far as I understand from F&Q, here is place for specific, current problems, that might be useful to others. Broad and vague questions are unliked here, and what I do would need such kind of elaborate. Maybe it's good idea to make, one day, devblog/research blog and put such ideas and notes there. – Grzegorz Wierzowiecki Jan 14 '12 at 11:29
  • @RikPoggi If you like, you might like to checkout, just a few issues I fall into: [Equivalence of two basic blocks @ TCS.SE](http://cstheory.stackexchange.com/questions/7999/equivalence-of-two-basic-blocks) or other questions on: [Security.SE](http://security.stackexchange.com/users/4077/?tab=questions), [binary instrumentation](http://stackoverflow.com/q/7968993), [gpgpu](http://stackoverflow.com/q/4635749),[Compiler Backend Programming](http://stackoverflow.com/q/8776903), [CPU management @ Unix.SE](http://unix.stackexchange.com/q/23106/), etc... :) – Grzegorz Wierzowiecki Jan 14 '12 at 11:34
  • 2
    @GrzegorzWierzowiecki: can we try to talk in chat http://chat.stackoverflow.com/rooms/6695/discussion-between-rik-p-and-grzegorz-wierzowiecki – Rik Poggi Jan 14 '12 at 12:28
4

Perhaps this does what you want:

def resize(l, newsize, filling=None):                                                                                  
    if newsize > len(l):                                                                                 
        l.extend([filling for x in xrange(len(l), newsize)])                                                 
    else:                                                                                                
        del l[newsize:]                  
Mischa Arefiev
  • 5,227
  • 4
  • 26
  • 34
2

Try this:

def ResizeList(some_list, length, null_item = None): 
    return some_list + [null_item 
                        for item in range(length - len(lst))]
Rob Bednark
  • 25,981
  • 23
  • 80
  • 125
Joel Cornett
  • 24,192
  • 9
  • 66
  • 88