1

I have a file1.dat and a file2.dat containing values. I want to replace the values of the file2.dat with the file1.dat in the first column by changing the file format and data.

I tried this awk command, but problem is its changing the file format and the entire first column is getting changed.

awk 'NR==FNR{a[NR]=$0;next}{$1=a[FNR]}1' file1.dat file2.dat > result.dat

File1.dat (input):

A123456789      1      C      HIE   1   48.343 23.545 32.02 1.00 0.00        H
A875678235      3      C      PHE   1   48.343 23.545 32.02 1.00 0.00        C
A907654234      4      N      ALA   1   48.343 23.545 32.02 1.00 0.00        N
A907863544      5      B      VAL   1   48.343 23.545 32.02 1.00 0.00        B

File2.dat (input):

987654321
567890123
098765432
890765348

Desired output:

A987654321      1       C     HIE   1  48.343 23.545 32.02 1.00 0.00         H
A567890123      3       C     PHE   1  48.343 23.545 32.02 1.00 0.00         C
A098765432      4       N     ALA   1  48.343 23.545 32.02 1.00 0.00         N
A890765348      5       B     VAL   1  48.343 23.545 32.02 1.00 0.00         B
Freddy
  • 4,548
  • 1
  • 7
  • 17
  • The challenge is reading a line at a time from two different files. It would probably be easier you could use a programming language like Perl, C, or Java. One option with bash is `read -r`: https://unix.stackexchange.com/questions/26601/how-to-read-from-two-input-files-using-while-loop – FoggyDay Apr 26 '20 at 17:26
  • I see you add the picture because you want to use color to make it easier to understand. But you need at least to post it as text as well. – Quasímodo Apr 26 '20 at 17:39
  • Welcome to Stack Overflow. SO is a question and answer page for professional and enthusiastic programmers. Add your own code to your question. You are expected to show at least the amount of research you have put into solving this question yourself. – Cyrus Apr 26 '20 at 17:52
  • It seems for me that simple change `$1=a[FNR]` => `$1="A"a[FNR]` should help. – rpoleski Apr 27 '20 at 07:07

3 Answers3

2

If you want to keep the first character of column1 (the A) in the first file and assuming it's okay to use tabs to separate the fields:

awk -v OFS='\t' '
  NR==FNR{ a[FNR]=$1; next }
  { $1=substr($1,1,1) a[FNR] }1
' file2.dat file1.dat > result.dat
Freddy
  • 4,548
  • 1
  • 7
  • 17
  • Awesome, so much simpler than I thought – Quasímodo Apr 26 '20 at 18:08
  • It's not so much different to the original script :) – Freddy Apr 26 '20 at 18:09
  • Its just printing the second file values alone with concating the first character in the field of file1. – Hemanth Tanna Apr 26 '20 at 19:16
  • @HemanthTanna Did you switch the arguments, i.e. `file1.dat file2.dat` instead of `file2.dat file1.dat`? – Freddy Apr 26 '20 at 19:28
  • To be clear, that's not working `with out changing the file format for each line`, it's replacing all chains of white space in the input with single tab chars. If the output matched the input than it'd be because all white space in the input happened to be individual tab chars but at least some of the sample input text posted such as `48.343 23.545 32.02 1.00 0.00` is clearly blank-separated, not tab-separated. – Ed Morton Apr 26 '20 at 20:32
  • Yes, `assuming it's okay to use tabs to separate the fields`. – Freddy Apr 26 '20 at 20:38
  • That might be a reasonable approach if you also set FS to `\t` because if you do that and all field-separating spaces are `\t`s then no harm done even if there are blanks within fields but if you don't do that and there are blanks in the file (as it looks like from the posted example) then you'd be changing the formatting by setting OFS alone to `\t`. – Ed Morton Apr 26 '20 at 21:27
  • @HemanthTanna from [your comment](https://stackoverflow.com/questions/61444739/is-there-any-shell-script-to-replace-values-of-file1-dat-to-file2-dat-with-out-c#comment108696830_61445616) it sounds like you have carriage returns (`CR` aka `\r` aka `control-M`) in your input, see https://stackoverflow.com/q/45772525/1745001. Swapping the order of the input files wouldn't produce the symptoms you describe but control-Ms would. – Ed Morton Apr 26 '20 at 21:32
1

This might work for you (GNU parallel):

parallel  echo {=1 's/^(.)\S+/$1$arg[2]/' =} :::: file1 ::::+ file2

Join the two input file by using the ::::+ operator and replace the last part of the first field by the file2 argument.

Alternative using cat & sed:

cat -n file2 | sed -E 's#\t(.*)#s/[0-9]+/\1/#' | sed -Ef - file1

Prepend line numbers to values in file2 and then replace the introduced tab and the following value by a sed command that replaces the first occurrence of multiple integers by that value. This command is piped into a second invocation of sed that acts on file1. The overall result is a sed command that replaces the first number in each line in file1 by the number in the same line in file2.

potong
  • 55,640
  • 6
  • 51
  • 83
0

These will both work with whatever spaces you have in your input as they don't change any of those spaces or make any assumptions about what they are:

$ paste file2 file1 | sed 's/\([^\t]*\)\t\(.\)[^[:space:]]*/\2\1/'
A987654321      1      C      HIE   1   48.343 23.545 32.02 1.00 0.00        H
A567890123      3      C      PHE   1   48.343 23.545 32.02 1.00 0.00        C
A098765432      4      N      ALA   1   48.343 23.545 32.02 1.00 0.00        N
A890765348      5      B      VAL   1   48.343 23.545 32.02 1.00 0.00        B

or if you prefer an awk solution:

$ awk 'NR==FNR{a[NR]=$1;next} {print substr($0,1,1) a[FNR] substr($0,length($1)+1)}' file2 file1
A987654321      1      C      HIE   1   48.343 23.545 32.02 1.00 0.00        H
A567890123      3      C      PHE   1   48.343 23.545 32.02 1.00 0.00        C
A098765432      4      N      ALA   1   48.343 23.545 32.02 1.00 0.00        N
A890765348      5      B      VAL   1   48.343 23.545 32.02 1.00 0.00        B

The problem you were having is that any time you modify a field (e.g. $1) awk reconstructs the record which, with the default FS and OFS, replaces all contiguous chains of white space with a single blank char. If you modify the record ($0) instead of any specific field that doesn't happen.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185