29

I'm new to scripting. I have a table (Table1.txt) and I need to create another table that has Table1's rows arranged in columns and vice versa. I have found solutions to this problem for Perl and SQL but not for Python.

I just started learning Python two days ago, so this is as far as I got:

import csv
import sys

with open(sys.argv[1], "rt") as inputfile:
   readinput = csv.reader(inputfile, delimiter='\t')
   with open("output.csv", 'wt') as outputfile:
      writer = csv.writer(outputfile, delimiter="\t")
      for row in readinput:
            values = [row[0], row[1], row[2], row[3]]
            writer.writerow([values])

This just reproduces the columns as columns. What I would have liked to do now is to write the last line as writer.writecol([values]) but it seems that there is no command like that and I haven't found another way of writing rows as columns.

martineau
  • 119,623
  • 25
  • 170
  • 301
Frank
  • 339
  • 1
  • 4
  • 6
  • http://stackoverflow.com/questions/4937491/matrix-transpose-in-python – ThePracticalOne May 08 '12 at 22:12
  • There's no such thing as "writing columns". You can only write rows. The simplest thing to do is read *all* the data from the input file, transpose the rows and columns, then write *all* the transposed data out to the output file. spinning_plate's answer is conceptually easier to follow for someone new to Python, though not as efficient as it could be. Ashwini's answer is much more concise and much quicker, but requires a bit more Python-specific knowledge. – John Y May 08 '12 at 22:12

5 Answers5

38

@Ashwini's answer is perfect. The magic happens in

zip(*lis)

Let me explain why this works: zip takes (in the simplest case) two lists and "zips" them: zip([1,2,3], [4,5,6]) will become [(1,4), (2,5), (3,6)]. So if you consider the outer list to be a matrix and the inner tuples to be the rows, that's a transposition (ie., we turned the rows to columns).

Now, zip is a function of arbitrary arity, so it can take more then two arguments:

# Our matrix is:
# 1 2 3
# 4 5 6
# 7 8 9

zip([1,2,3], [4,5,6], [7,8,9])

>>> [(1, 4, 7), (2, 5, 8), (3, 6, 9)]

# Now it is
# 1 4 7
# 2 5 8
# 3 6 9

The problem we're facing is that in your case, we don't know how many arguments we want to pass to zip. But at least, we already know the arguments: they are the elements of lis! lis is a list, and each element of that list is a list as well (corresponding to one line of numbers in your input file). The * is just Pythons way of telling a function "please use the elements of whatever follows as your arguments and not the thing itself!"

So

lis = [[1,2,3], [4,5,6]]
zip(*lis)

is exactly the same as

zip([1,2,3], [4,5,6])

Congrats, now you're a Python pro! ;-)

Manuel Ebert
  • 8,429
  • 4
  • 40
  • 61
32

The solution in general to transpose a sequence of iterables is: zip(*original_list)

sample input:

1   2   3   4   5
6   7   8   9   10
11  12  13  14  15

program:

with open('in.txt') as f:
  lis = [x.split() for x in f]

for x in zip(*lis):
  for y in x:
    print(y+'\t', end='')
  print('\n')

output:

1   6   11  

2   7   12  

3   8   13  

4   9   14  

5   10  15
Ashwini Chaudhary
  • 244,495
  • 58
  • 464
  • 504
  • Thanks, Ashwini! This zip business seems to be the best way ahead for someone like me who will need to do these transpositions all the time now... Very much appreciated! – Frank May 09 '12 at 17:06
  • I just did it with my list and it worked. By the way, why don't you have to use the csv.reader to read in the txt file? I thought that was a necessary prerequisite for the computer to read the data... – Frank May 09 '12 at 17:54
  • @Frank I am not so familiar with csv module that's why I used a example with columns separated by a tab-space(as you used `delimiter='\t'`). – Ashwini Chaudhary May 09 '12 at 17:59
  • 1
    @AshwiniChaudhary sorry I'm new to python, Can you please tell me what is **end=''** doing in print statement as why I take this code, it shows an error. Thanks – colourtheweb Oct 21 '15 at 15:35
  • 2
    @colourtheweb You must be using Python 2 then, `print()` is a function in Python 3 and accepts an optional argument called `end`. The default value of end is `'\n'`. – Ashwini Chaudhary Oct 21 '15 at 19:46
  • @AshwiniChaudhary yes you are right, I'm using 2.7.1 – colourtheweb Oct 23 '15 at 11:30
  • What's the meaning of `*` in front of `list` in `zip(*list)`? – Jeff B Dec 31 '16 at 05:38
24

Since we are talking about columns, rows and transposes, perhaps it is worth it to mention numpy

>>> import numpy as np
>>> x = np.array([[1,2,3],[4,5,6],[7,8,9],[10,11,12]])
>>> x
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])
>>> x.T
array([[ 1,  4,  7, 10],
       [ 2,  5,  8, 11],
       [ 3,  6,  9, 12]])
Akavall
  • 82,592
  • 51
  • 207
  • 251
2

Just to construct on @Akavall answer, if you want to read from a file, transpose and then save again just do:

from numpy import genfromtxt, savetxt
data = genfromtxt('in.txt')
savetxt('out.txt',data.T)

data.T in the 3rd line is where the data gets transposed.

elyase
  • 39,479
  • 12
  • 112
  • 119
1

Here's one way to do it, assume for simplicity that you just want to print out the objects in order:

  # lets read all the data into a big 2d array
  buffer = []
  for row in readinput: 
        values = [row[0], row[1], row[2], row[3]]  
        buffer.append(values)       

  # what you have in your code
  for i in range(len(buffer)):
      for j in range(len(buffer[0])):
          print buffer[i][j]

  # this is called a transpose; we have buffer[i][j] to read row then column, 
  #    switch i and j around to do the opposite
  for i in range(len(buffer[0])):
      for j in range(len(buffer)):
          print buffer[j][i]

Since you need an array to pass to writer.writerow , you could do this

  for i in range(len(buffer[0])):
      writer.writerow([buffer[j][i] for j in range(len(buffer))])
dfb
  • 13,133
  • 2
  • 31
  • 52
  • The `writer` in the `csv` module doesn't have a `write` method. You have to write a row at a time with `writerow`, or write all the rows at once with `writerows`. – John Y May 08 '12 at 22:07
  • @JohnY - thanks, hopefully this clears things up. My thought is using zip isn't going to really help this person if they are new to scripting. – dfb May 08 '12 at 22:14
  • 1
    I agree that `zip` is not exactly a beginner feature, especially when combined with the asterisk in Tadeck's comment on the other answer. Whether it's helpful to bring up `zip` right off the bat depends on whether the person wants to actually learn Python, or just get his task done as quickly as possible. (Lots of nonprogrammers do all their "coding" by Googling for recipes, and in some cases, this approach works for them.) – John Y May 08 '12 at 22:20
  • Agreed, writing out another way to perform the 'zip' functionality is also a stepping stone to actually learning the language as well. There are plenty of programmers that know the language (i.e., the language specific shortcuts) without knowing how to implement them themselves.... – dfb May 08 '12 at 22:22
  • `values = [row[0], row[1], row[2], row[3]]` is better written as `values = row[:4]` – John Machin May 09 '12 at 05:03
  • Thanks to all of you... I was not just trawling for solutions, but I'm trying to learn, so comparing the two ways (zip versus buffer transpose) is very instructive. Thumbs up! – Frank May 09 '12 at 17:11