10

I have the following python code that uses zip() and it seems to cause unintended data truncation.

inc_data = [[u'Period Ending', u'Dec 31, 2012', u'Dec 31, 2011', u'Dec 31, 2010'],
            [u'Total Revenue\n', u'104,507,100\n', u'106,916,100\n', u'99,870,100\n'],
            [u'Cost of Revenue\n',u'56,000,000\n']
            ]

inc_data2 = zip(*inc_data)
for i in inc_data2:
    print i

It only prints:

(u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n')
(u'Dec 31, 2012', u'104,507,100\n', u'56,000,000\n')

But I want it to print the following, but apparently I have to add in fillers u'' by hand in order to prevent zip() from truncating the inc_data. But I don't know how to code that.

(u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n')
(u'Dec 31, 2012', u'104,507,100\n', u'56,000,000\n')
(u'Dec 31, 2011', u'106,916,100\n', u'')
(u'Dec 31, 2010', u'99,870,100\n', u'')

To describe inc_data above,

inc_data = [ [x],
             [y],
             [z] ]   

How do I make x, y and z to be the same length? And the length is the max length of x, y, or z?

(u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n')
(u'Dec 31, 2012', u'104,507,100\n', u'56,000,000\n')
(u'Dec 31, 2011', u'106,916,100\n', u'')
(u'Dec 31, 2010', u'99,870,100\n', u'')

Sorry for the lengthy and wordy explanation of the problem. Could you help me or point me to a similar question that has been answered, if one exists? many thanks!

vt2424253
  • 1,387
  • 4
  • 25
  • 39
  • 2
    Use [itertools.izip_longest](http://docs.python.org/2/library/itertools.html#itertools.izip_longest) with `fillvalue=u''` – falsetru Oct 26 '13 at 05:21

2 Answers2

17

Use izip_longest:

from itertools import izip_longest

inc_data = [[u'Period Ending', u'Dec 31, 2012', u'Dec 31, 2011', u'Dec 31, 2010'],
            [u'Total Revenue\n', u'104,507,100\n', u'106,916,100\n', u'99,870,100\n'],
            [u'Cost of Revenue\n',u'56,000,000\n']
            ]

print list(izip_longest(*inc_data, fillvalue=u'')) 


# [(u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n'), 
   (u'Dec 31, 2012', u'104,507,100\n', u'56,000,000\n'), 
   (u'Dec 31, 2011', u'106,916,100\n', u''), 
   (u'Dec 31, 2010', u'99,870,100\n', u'')]
dawg
  • 98,345
  • 23
  • 131
  • 206
  • izip_longest works perfectly. Once the data is transposed with izip_longest, do you know how I can delete the first line so that I don't have (u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n')? Essentially, I will be storing the data in a sqlite3 table. many thanks! – vt2424253 Oct 26 '13 at 06:10
  • Use a [slice:](http://stackoverflow.com/questions/509211/pythons-slice-notation) `[[1,2],[3,4],[5,6]][1:]=[[3, 4], [5, 6]]` The slice of `list[1:]` would remove the first list of tuples. So you could do: `list(izip_longest(*inc_data, fillvalue=u''))[1:]` for example – dawg Oct 26 '13 at 23:55
  • Thanks. Trial and error yield the results you gave me. I didn't know it is called slice, so I wasn't able to search for that technique. Thanks for the tip. – vt2424253 Oct 30 '13 at 00:45
  • 4
    In Python 3, the function is called `zip_longest`. – Soren Bjornstad Aug 20 '17 at 23:01
1

Python 2.x:

Use izip_longest:

from itertools import izip_longest

inc_data = [[u'Period Ending', u'Dec 31, 2012', u'Dec 31, 2011', u'Dec 31, 2010'],
            [u'Total Revenue\n', u'104,507,100\n', u'106,916,100\n', u'99,870,100\n'],
            [u'Cost of Revenue\n',u'56,000,000\n']
            ]
  
print list(izip_longest(*inc_data, fillvalue=u'')) 


# [(u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n'), 
   (u'Dec 31, 2012', u'104,507,100\n', u'56,000,000\n'), 
   (u'Dec 31, 2011', u'106,916,100\n', u''), 
   (u'Dec 31, 2010', u'99,870,100\n', u'')]

Python 3.xx:

izip_longest() has been renamed to zip_longest() Use zip_longest:

from itertools import zip_longest

inc_data = [[u'Period Ending', u'Dec 31, 2012', u'Dec 31, 2011', u'Dec 31, 2010'],
            [u'Total Revenue\n', u'104,507,100\n', u'106,916,100\n', u'99,870,100\n'],
            [u'Cost of Revenue\n',u'56,000,000\n']
            ]
  
print list(zip_longest(*inc_data, fillvalue=u'')) 


# [(u'Period Ending', u'Total Revenue\n', u'Cost of Revenue\n'), 
   (u'Dec 31, 2012', u'104,507,100\n', u'56,000,000\n'), 
   (u'Dec 31, 2011', u'106,916,100\n', u''), 
   (u'Dec 31, 2010', u'99,870,100\n', u'')]
Lucifer
  • 79
  • 6