I have Table 1 which has thousands of rows that looks like this:
chr1 4399801 4400245 peak_12659 719 . 32.37675 -1 1.92924 222
1 444
chr1 2495548 2495992 peak_11970 542 . 36.95443 -1 2.58372 222
1 444
chr1 3572002 3572264 peak_901 1000 . 148.62292 -1 3.94096 145
1 262
I want to remove the empty cells that appear in every other row (under chr1 in each row above), then combine these to each preceding row so the final table appears like this:
Table 2:
chr1 4399801 4400245 peak_12659 719 . 32.37675 -1 1.92924 222 1 444
chr1 2495548 2495992 peak_11970 542 . 36.95443 -1 2.58372 222 1 444
chr1 3572002 3572264 peak_901 1000 . 148.62292 -1 3.94096 145 1 262
How can I accomplish this?
Edit- In response to @Cyrus: I found it very difficult to find answers to this question. But I stumbled upon this thread (not exactly what I'm trying to accomplish) and tried the following:
awk '{printf "%s%s",$0,(NR%2?FS:RS)}' Table1.txt > Table2.txt
This command did not merge the alternating rows correctly and in some instances, combined cells instead: Screenshot here
I also tried:
xargs -n2 < Table1.txt > Table2.txt
Each row in the output contains two merged cells: Screenshot 2
Edit 2- In response to @markp-fuso, I tried the command you listed but my output looks like this: Screenshot 3
Edit 3- Sorry for the screenshots. Here is the output from head -4 Table1.txt | od -c
:
0000000 c h r 1 \t 4 3 9 9 8 0 1 \t 4 4 0
0000020 0 2 4 5 \t p e a k _ 1 2 6 5 9 \t
0000040 7 1 9 \t . \t 3 2 . 3 7 6 7 5 \t -
0000060 1 \t 1 . 9 2 9 2 4 \t 2 2 2 \n \t 1
0000100 \t 4 4 4 \n c h r 1 \t 2 4 9 5 5 4
0000120 8 \t 2 4 9 5 9 9 2 \t p e a k _ 1
0000140 1 9 7 0 \t 5 4 2 \t . \t 3 6 . 9 5
0000160 4 4 3 \t - 1 \t 2 . 5 8 3 7 2 \t 2
0000200 2 2 \n \t 1 \t 4 4 4 \n
0000212
Edit 4- @ markp-fuso Here is the output from head -4 Table2.txt | od -c
0000000 c h r 1 \t 4 3 9 9 8 0 1 \t 4 4 0
0000020 0 2 4 5 \t p e a k _ 1 2 6 5 9 \t
0000040 7 1 9 \t . \t 3 2 . 3 7 6 7 5 \t -
0000060 1 \t 1 . 9 2 9 2 4 \t 2 2 2 \t \t \t
0000100 \r \n \t 1 \t 4 4 4 \t c h r 1 \t 2 4
0000120 9 5 5 4 8 \t 2 4 9 5 9 9 2 \t p e
0000140 a k _ 1 1 9 7 0 \t 5 4 2 \t . \t 3
0000160 6 . 9 5 4 4 3 \t - 1 \t 2 . 5 8 3
0000200 7 2 \t 2 2 2 \r \n \t 1 \t 4 4 4 \t \t
0000220 \t \t \t \t \t \t \t \t \r \n c h r 1 \t 3
0000240 5 7 2 0 0 2 \t 3 5 7 2 2 6 4 \t p
0000260 e a k _ 9 0 1 \t 1 0 0 0 \t . \t 1
0000300 4 8 . 6 2 2 9 2 \t - 1 \t 3 . 9 4
0000320 0 9 6 \t 1 4 5 \t \t \t \r \n
0000334
Edit 5- The problem is mostly solved after fixing windows/dos line endings present in my Table1.txt
.
Here is what I did:
dos2unix Table1.txt
awk 'BEGIN { FS=OFS="\t" } FNR%2==1 { a=$0 } FNR%2==0 { print a,$0 }' Table1.txt > Table2_b.txt
head -4 Table2_b.txt | od -c
0000000 c h r 1 \t 4 3 9 9 8 0 1 \t 4 4 0
0000020 0 2 4 5 \t p e a k _ 1 2 6 5 9 \t
0000040 7 1 9 \t . \t 3 2 . 3 7 6 7 5 \t -
0000060 1 \t 1 . 9 2 9 2 4 \t 2 2 2 \t \t 1
0000100 \t 4 4 4 \n c h r 1 \t 2 4 9 5 5 4
0000120 8 \t 2 4 9 5 9 9 2 \t p e a k _ 1
0000140 1 9 7 0 \t 5 4 2 \t . \t 3 6 . 9 5
0000160 4 4 3 \t - 1 \t 2 . 5 8 3 7 2 \t 2
0000200 2 2 \t \t 1 \t 4 4 4 \n c h r 1 \t 3
0000220 5 7 2 0 0 2 \t 3 5 7 2 2 6 4 \t p
0000240 e a k _ 9 0 1 \t 1 0 0 0 \t . \t 1
0000260 4 8 . 6 2 2 9 2 \t - 1 \t 3 . 9 4
0000300 0 9 6 \t 1 4 5 \t \t 1 \t 2 6 2 \n c
0000320 h r 1 \t 9 5 8 4 0 0 3 \t 9 5 8 4
0000340 4 4 7 \t p e a k _ 1 0 9 0 8 \t 6
0000360 2 6 \t . \t 4 1 . 3 7 5 2 9 \t - 1
0000400 \t 2 . 8 7 7 6 \t 2 2 2 \t \t 1 \t 4
0000420 4 4 \n
0000423
The only issue now is an extra \t
present (11th column).
head -4 Table2_b.txt
chr1 4399801 4400245 peak_12659 719 . 32.37675 -1 1.92924 222 1 444
chr1 2495548 2495992 peak_11970 542 . 36.95443 -1 2.58372 222 1 444
chr1 3572002 3572264 peak_901 1000 . 148.62292 -1 3.94096 145 1 262
chr1 9584003 9584447 peak_10908 626 . 41.37529 -1 2.8776 222 1 444