Swap two columns - awk, sed, python, perl

Question

I've got data in a large file (280 columns wide, 7 million lines long!) and I need to swap the first two columns. I think I could do this with some kind of awk for loop, to print $2, $1, then a range to the end of the file - but I don't know how to do the range part, and I can't print $2, $1, $3...$280! Most of the column swap answers I've seen here are specific to small files with a manageable number of columns, so I need something that doesn't depend on specifying every column number.

The file is tab delimited:

Affy-id chr 0 pos NA06984 NA06985 NA06986 NA06989

score 126 · Accepted Answer · answered Aug 15 '12 at 10:35

126

You can do this by swapping values of the first two fields:

awk ' { t = $1; $1 = $2; $2 = t; print; } ' input_file

answered Aug 15 '12 at 10:35

perreal

94,503
21
155
181

1

That is so neat and elegant, thank you! I was hoping there would be a one-liner out there. – Charley Farley Aug 15 '12 at 10:43
4

This answer is problematic with different sizes of columns and their separators. More extensible answer here http://unix.stackexchange.com/a/31596/16920 – Léo Léopold Hertz 준영 Jul 01 '15 at 14:35
actually it's not problematic with different number of columns, only with mixed separators: for example, if you have both tabs and spaces in your file, and you only want to separate ields by tabs, you need the `BEGIN{FS='\t'}` trick. – caesarsol Jul 08 '15 at 10:55
4

Using `-F '\t'` tabs are eaten away in the final output. Is there a way to preserve them? – Atcold Nov 02 '15 at 15:49
3

OK, one has to specify `OFS=$'\t'` as pointed out by the [answer](http://stackoverflow.com/a/29532554/2247039) below. @perreal, perhaps it's worth it updating the answer with the additional parameter? – Atcold Nov 02 '15 at 15:51
14

if you use: `awk '{ print $2, $1}' ` is the same :D – A.Villegas Feb 04 '19 at 17:34
This was really helpful! Thanks for sharing! – imesh May 23 '19 at 06:13
1

@A.Villegas That only prints the first two columns. – jinawee Jan 09 '20 at 11:13
I'm surprised this answer is so upvoted. Certainly not "so neat and elegant", as wrongly stated above. As @A.Villegas pointed out, `awk '{ print $2, $1 }'` is neat and elegant, not an unnecessary variable swap. Sorry, had to downvote. – cornuz Mar 04 '21 at 10:23
2

@cornuz, no problem, Note that your suggestion only prints 2 columns. OP wants to print all columns not just the first two. – perreal Mar 04 '21 at 21:27
@perreal, my bad, I hadn't noticed that. I would remove my downvote, but it doesn't allow me anymore. Sorry, next time I'll be more careful. – cornuz Mar 04 '21 at 22:10
@Atcold for sure! | sed 's/ /\t/' – user145837 Sep 29 '21 at 20:08

score 26 · Answer 2 · answered Apr 09 '15 at 07:37

26

I tried the answer of perreal with cygwin on a windows system with a tab separated file. It didn't work, because the standard separator is space.

If you encounter the same problem, try this instead:

awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file

Incoming separator is defined by -F $'\t' and the seperator for output by OFS=$'\t'.

awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file > output_file

answered Apr 09 '15 at 07:37

emi-le

756
9
26

2

Super! I was missing the `OFS=$'\t'` parameter! – Atcold Nov 02 '15 at 15:52
3

This approach can end up with tabs at the start of the line. Might not be the intended outcome. – Kenny Powers Jun 26 '17 at 13:37

score 12 · Answer 3 · answered Dec 08 '16 at 10:43

12

Try this more relevant to your question :

awk '{printf("%s\t%s\n", $2, $1)}' inputfile

answered Dec 08 '16 at 10:43

Pradyumna Sagar

365
3
10

9

This only prints the first two columns. Slightly more compact is `awk '{print $2 "\t" $1}' inputfile`. – Fuujuhi Jun 27 '18 at 06:33

score 6 · Answer 4 · answered Aug 15 '12 at 11:08

6

This might work for you (GNU sed):

sed -i 's/^\([^\t]*\t\)\([^\t]*\t\)/\2\1/' file

answered Aug 15 '12 at 11:08

potong

55,640
6
51
83

Perfect solution for us vim users. – awm Nov 29 '21 at 12:36

score 3 · Answer 5 · answered Aug 15 '12 at 10:42

3

Have you tried using the cut command? E.g.

cat myhugefile | cut -c10-20,c1-9,c21- > myrearrangedhugefile

answered Aug 15 '12 at 10:42

Robbie Dee

1,939
16
43

I haven't, but I'll remember that for future use! – Charley Farley Aug 15 '12 at 10:44
3

-c=characters ... so this does not exchange columns. – blehman Dec 27 '13 at 19:44
It will swap columns in the output file - try it for yourself – Robbie Dee Jan 20 '14 at 22:03
5

how can we do it without knowing the character count ? `cat myhugefile | cut -f2,1` gives the same output as `cat myhugefile | cut -f1,2` – Hady Elsahar Feb 02 '14 at 00:38
5

You can output each column to an intermediate file. Something like: **cut -f2 myhugefile > piece1 ; cut -f1 myhugefile > piece2 | paste piece1 piece2 > myrearrangedhugefile ; rm piece1 ; rm piece2** – Robbie Dee Feb 03 '14 at 17:17

score 3 · Answer 6 · edited Jun 01 '15 at 23:07

3

This is also easy in perl:

perl -pe 's/^(\S+)\t(\S+)/$2\t$1/;' file > outputfile

edited Jun 01 '15 at 23:07

kenorb

155,785
88
678
743

answered Jun 01 '15 at 23:01

Aaron Lawson

31
1

score 2 · Answer 7 · answered Jul 02 '15 at 06:15

You could do this in Perl:

perl -F\\t -nlae 'print join("\t", @F[1,0,2..$#F])' inputfile

The -F specifies the delimiter. In most shells you need to precede a backslash with another to escape it. On some platforms -F automatically implies -n and -a so they can be dropped.

For your problem you wouldn't need to use -l because the last columns appears last in the output. But if in a different situation, if the last column needs to appear between other columns, the newline character must be removed. The -l switch takes care of this.

The "\t" in join can be changed to anything else to produce a different delimiter in the output.

2..$#F specifies a range from 2 until the last column. As you might have guessed, inside the square brackets, you can put any single column or range of columns in the desired order.

score 2 · Answer 8 · answered Feb 05 '20 at 14:25

No need to call anything else but your shell:

bash> while read col1 col2 rest; do 
        echo $col2 $col1 $rest
      done <input_file

Test:

bash> echo "first second a c d e f g" | 
      while read col1 col2 rest; do 
        echo $col2 $col1 $rest
      done
second first a b c d e f g

score 0 · Answer 9 · answered Jan 20 '20 at 16:22

Maybe even with "inlined" Python - as in a Python script within a shell script - but only if you want to do some more scripting with Bash beforehand or afterwards... Otherwise it is unnecessarily complex.

Content of script file process.sh:

#!/bin/bash

# inline Python script
read -r -d '' PYSCR << EOSCR
from __future__ import print_function
import codecs
import sys

encoding = "utf-8"
fn_in = sys.argv[1]
fn_out = sys.argv[2]

# print("Input:", fn_in)
# print("Output:", fn_out)

with codecs.open(fn_in, "r", encoding) as fp_in, \
        codecs.open(fn_out, "w", encoding) as fp_out:
    for line in fp_in:
        # split into two columns and rest
        col1, col2, rest = line.split("\t", 2)
        # swap columns in output
        fp_out.write("{}\t{}\t{}".format(col2, col1, rest))
EOSCR

# ---------------------
# do setup work?
# e. g. list files for processing

# call python script with params
python3 -c "$PYSCR" "$inputfile" "$outputfile"

# do some more processing
# e. g. rename outputfile to inputfile, ...

If you only need to swap the columns for a single file, then you can also just create a single Python script and statically define the filenames. Or just use an answer above.

score 0 · Answer 10 · answered Dec 09 '22 at 02:15

awk swapping sans temp-variable :

echo '777777744444444464449: 317 647 14423 262927714037  :   0x2A29D5A1BAA7A95541' |

mawk '1; ($1 = $2 substr(_, ($2 = $1)^_))^_' FS=':' OFS=':'

777777744444444464449: 317 647 14423 262927714037  :   0x2A29D5A1BAA7A95541

 317 647 14423 262927714037  :777777744444444464449:   0x2A29D5A1BAA7A95541

Swap two columns - awk, sed, python, perl

10 Answers10

Linked

Related