3

I have already seen Create an empty list in python with certain size - Stack Overflow; but I just wanted to confirm - consider this MWE:

data = ( ( "x1", ( (3, "a"), (1, "b"),  (5, "c") )  ), ( "x2", ( (2, "a"), (4, "b") )  ) )

outputA = []

for ix in data:
  print ix[0] # x1, x2
  for isnip in ix[1]:
    outputA.append(isnip)

print outputA
# [(3, 'a'), (1, 'b'), (5, 'c'), (2, 'a'), (4, 'b')]

outputB = []

for ix in data:
  print ix[0] # x1, x2
  for isnip in ix[1]:
    outputB.insert(isnip[0], isnip)

print outputB
# [(3, 'a'), (1, 'b'), (2, 'a'), (5, 'c'), (4, 'b')]

outputC = [None] * (5+1) #[]
for ix in data:
  print ix[0] # x1, x2
  for isnip in ix[1]:
    outputC[isnip[0]] = isnip

print outputC
# [None, (1, 'b'), (2, 'a'), (3, 'a'), (4, 'b'), (5, 'c')]

I have data where there are 2D tuples (actually, in my real case, dicts, but nevermind that), whose first element is an ordering index; they are unsorted, and I need them sorted. However, they are at all possible levels of nesting (I have simplified data above for an easier example; in my real situation they can be nested even further), so I cannot easily issue a "sorted" command.

So I thought about inserting elements - as you can see, I cannot get .insert() to preserve the order. So I thought then about explicit assignment - and that works, but only if the list is sized beforehand; and to find the size, I still would have to go through an extra recursion, just to discover what the maximum index is.

Thus, I would like to insert at exact location (not "before" like .insert() does) of a list, but without explicitly sizing the list beforehand - is there any way that this can be achieved?


EDIT: Here is something more like my actual data, showing (hopefully) why it would be difficult to sort it:

data = ( ( "x1", ( (3, "a"), (1, "b"),  (5, "c") )  ), ( "x2", ( "x3", ( (2, "a"), (4, "b") ) )  ), ("x100", 1 ) )

outputA = []

for ix in data:
  #print "[0]", ix[0], "[1]", ix[1] # x1, x2, x100
  try:
    for isnip in ix[1]:
      #print "isnip", isnip[0], "-", isnip[1]
      if int(isnip[0]) == isnip[0]:
        outputA.append(isnip)
      else:
        raise Exception("not good")
  except:
    try:
      for isnip in ix[1][1]:
        #print "isnip", isnip[0], "-", isnip[1]
        if int(isnip[0]) == isnip[0]:
          outputA.append(isnip)
    except:
      #print "skipping this"
      pass

print outputA
# [(3, 'a'), (1, 'b'), (5, 'c'), (2, 'a'), (4, 'b')]

outputB = []

for ix in data:
  try:
    for isnip in ix[1]:
      if int(isnip[0]) == isnip[0]:
        outputB.insert(isnip[0]+1, isnip)
      else:
        raise Exception("not good")
  except:
    try:
      for isnip in ix[1][1]:
        #print "isnip", isnip[0], "-", isnip[1]
        if int(isnip[0]) == isnip[0]:
          outputB.insert(isnip[0]+1, isnip)
    except:
      #print "skipping this"
      pass

print outputB
# [(3, 'a'), (1, 'b'), (5, 'c'), (2, 'a'), (4, 'b')]
Community
  • 1
  • 1
sdaau
  • 36,975
  • 46
  • 198
  • 278
  • 2
    So your question is how to flatten the nested list into an unnested one? Because then you can easily sort it. Otherwise I fail to understand the question I guess. Sounds like what you're asking is not really a good way to solve your problem – Niklas B. Mar 12 '14 at 21:46
  • Hi @NiklasB. - somewhat, but not exactly; I'd need _only_ the leaf "nodes" () in my real case, (there is another criteria determining if they are usable or not); my question is if there is an approach where I can insert elements in an array at an exact location without sizing it beforehand. Cheers! – sdaau Mar 12 '14 at 21:47
  • 1
    Maybe I'm being thick, but if you can `insert` it and it's almost right but off by 1, why can't you `insert(isnip[0]+1)`? I fully admit I'm not quite grokking your data set, so feel free to correct me. – Two-Bit Alchemist Mar 12 '14 at 21:51
  • 1
    @sdaau: Your question *seems* to be how to get the sorted list of leaves, not how to insert into arrays. The latter is just your approach of solving the former problem, if I understand you correctly. – Niklas B. Mar 12 '14 at 22:07
  • Thanks @Two-BitAlchemist - but it doesn't work; I just made an edit to the OP, showing what the resuts of `insert(isnip[0]+1)` - not sorted. Cheers! – sdaau Mar 12 '14 at 22:07
  • @NiklasB. - in a way yes; but see the edit to the post, I added a somewhat more representative data; I cannot really see how to easily sort that - hints would be appreciated. Cheers! – sdaau Mar 12 '14 at 22:08
  • 1
    What is `("x100", 1 )` supposed to mean? It's totally inconsistent. By the way, my answer already covers your update. – Niklas B. Mar 12 '14 at 22:13
  • @NiklasB. - I didn't mention anywhere the data was consistent; that is one of the reasons why I have difficulties thinking about it in terms of nodes and leaves, and why I'd rather just insert at position (without previous sizing), given that I know the ints should be indexes. – sdaau Mar 12 '14 at 22:20
  • 1
    @sdaau: That's not what I mean. I mean the *structure* is inconsistent. Independent of how you plan to come up with the final array, you must understand the structure in any case, and in this case the structure seems to be ill-defined – Niklas B. Mar 12 '14 at 22:24

1 Answers1

3

Think about your data as a tree:

data = ( "x", (
          ( "x1", (
             (3, "a"),
             (1, "b"),  
             (5, "c"))), 
          ( "x2", ( 
             (2, "a"), 
             (4, "b")))))

I added a root node to bring it into a consistent format. What constitues a leaf in that tree?

def isleaf(x):      
    return not isinstance(x[1], tuple)

Now you can just run a simple depth-first search to get the leaves in preorder:

def dfs(x):
    if isleaf(x):
        yield x
        return
    for y in x[1]: 
        yield from dfs(y)

Example:

>>> list(dfs(data))
[(3, 'a'), (1, 'b'), (5, 'c'), (2, 'a'), (4, 'b')]
>>> sorted(dfs(data), key=lambda x: x[0])
[(1, 'b'), (2, 'a'), (3, 'a'), (4, 'b'), (5, 'c')]

This can be extended to any other tree-like data.

UPDATE: If you absolutely must avoid the sorting step for some reason, you can just collect the results in a dict and construct the array afterwards.

d = {}

def dfs(x):
    if isleaf(x):
        d[x[0]] = x
        return
    for y in x[1]: 
        dfs(y)
dfs(data)

res = [None] * (max(d) + 1)
for i, v in d.items():
    res[i] = v
Niklas B.
  • 92,950
  • 18
  • 194
  • 224
  • Thanks for that, @NiklasB. - good to know about `dfs`, will have to wrap my head around that eventually. But re: your update - that is exactly what I'm doing in my `OutputC` in OP; and the problem is that to find `highest_index`, which I don't know apriori, I must run a complete recursion of the tree (which is huge in reality); that is what I would have liked to avoid. And it could have been avoided, if I could have inserted at a location of a list exactly, without previous sizing. – sdaau Mar 12 '14 at 22:16
  • 1
    @sdaau: Well you can't have both. Either you find the highest index first, setup the array at the beginning and then do a second traversal OR dynamically grow the array during the recursion. Both is easy. – Niklas B. Mar 12 '14 at 22:18
  • Many thanks @NiklasB. - "you can't have both" is what finally answers it for me `:)` I can see the edit with the dynamic sizing, that looks like it will help. Cheers! – sdaau Mar 12 '14 at 22:22
  • 1
    @sdaau: That was more of a short moment of clouded mind, actually it's much wiser to use a hashmap (dict). – Niklas B. Mar 12 '14 at 22:22