Duplicate first column of multiple text files in bash

Question

I have multiple text files each containing two columns and I would like to duplicate the first column in each file in bash to have three columns in the end.

File:

Output file:

sP100227 sP100227 1
sP100267 sP100267 1
sP100291 sP100291 1
sP100493 sP100493 1

I tried:

txt=path/to/*.txt
echo "$(paste <(cut -f1-2 $txt) > "$txt"

you can't use the same file as input and output. The output redirection truncates the file. — Barmar, Dec 04 '20 at 20:57

score 4 · Answer 1 · answered Dec 04 '20 at 20:57

Could you please try following. Written and tested with shown samples in GNU awk. This will add fields to only those lines which have 2 fields in it.

awk 'NF==2{$1=$1 OFS $1} 1' Input_file

In case you don't care of number of fields and simply want to have value of 1st field 2 times then try following.

awk '{$1=$1 OFS $1} 1' Input_file

OR if you only have 2 fields in your Input_file then we need not to rewrite the complete line we could simply print them as follows.

awk '{print $1,$1,$2}' Input_file

To save output into same Input_file itself append > temp && mv temp Input_file for above solutions(after testing).

score 3 · Answer 2 · answered Dec 04 '20 at 21:04

Use a temp file, with cut -f1 and paste, like so:

paste <(cut -f1 in_file) in_file > tmp_file
mv tmp_file in_file

Alternatively, use a Perl one-liner, like so:

perl -i.bak -lane 'print join "\t", $F[0], $_;' in_file

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
-a : Split $_ into array @F on whitespace or on the regex specified in -F option.
-i.bak : Edit input files in-place (overwrite the input file). Before overwriting, save a backup copy of the original file by appending to its name the extension .bak.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

score 2 · Accepted Answer · answered Dec 04 '20 at 21:00

The default delimiter in cut and paste is TAB, but your file looks to be space-separated.

You can't use the same file as input and output redirection, because when the shell opens the file for output it truncates it, so there's nothing for the program to read. Write to a new file and then rename it.

Your paste command is only being given one input file. And there's no need to use echo.

paste -d' ' <(cut -d' ' -f1 "$txt") "$txt" > "$txt.new" && mv "$txt.new" "$txt"

You can do this more easily using awk.

awk '{print $1, $0}' "$txt" > "$txt.new" && mv "$txt.new" "$txt"

GNU awk has an in-place extension, so you can use that if you like. See Save modifications in place with awk

1

`for txt in file1 file2 ...` – Barmar Dec 04 '20 at 21:29

Vercingatorix · Answer 4 · 2020-12-04T21:26:09.813

Try sed -Ei 's/\s*(\S+)\s+/\1 \1 /1' $txt if your fields are separated by strings of one or more whitespace characters. This used the Stream Editor (sed) replaces (s///1) the first string of non-space characters (\S+) followed by a string of whitespace characters (\s+) with the same thing repeated with intervening spaces(\1 \1 ). It keeps the rest of the line. The -E to sed means use extended pattern matching (+, ( vs. \(). The -i means do it in-place, replacing the file with the output.

You could use awk and do awk '{ printf "%s %s\n",$1,$0 }'. This takes the first whitespace-delimited field ($1) and follows it with a space and the whole line ($0) followed by a newline. This is a little clearer than sed but it doesn't have the advantage of being in-place.

If you can guarantee they are delimited by only one space, with no leading spaces, you can use paste -d' ' <(cut -d' ' -f1 ${txt}) ${txt} > ${txt}.new; mv ${txt}.new ${txt}. The -d' ' sets the delimiter to space for both cut and paste. You know this but for others -f1 means extract the first -d-delimited field. The mv command replaces the input with the output.

Duplicate first column of multiple text files in bash

4 Answers4