How to remove empty cells in a row and combine every other row in tab delimited file

Question

I have Table 1 which has thousands of rows that looks like this:

chr1    4399801 4400245 peak_12659  719  .   32.37675    -1  1.92924 222
        1       444                         
chr1    2495548 2495992 peak_11970  542  .   36.95443    -1  2.58372 222
        1       444                         
chr1    3572002 3572264 peak_901    1000 .   148.62292   -1  3.94096 145
        1       262

I want to remove the empty cells that appear in every other row (under chr1 in each row above), then combine these to each preceding row so the final table appears like this:

Table 2:

chr1    4399801 4400245 peak_12659  719     .   32.37675    -1  1.92924 222  1  444
chr1    2495548 2495992 peak_11970  542     .   36.95443    -1  2.58372 222  1  444
chr1    3572002 3572264 peak_901    1000    .   148.62292   -1  3.94096 145  1  262

How can I accomplish this?

Edit- In response to @Cyrus: I found it very difficult to find answers to this question. But I stumbled upon this thread (not exactly what I'm trying to accomplish) and tried the following:

awk '{printf "%s%s",$0,(NR%2?FS:RS)}' Table1.txt > Table2.txt

This command did not merge the alternating rows correctly and in some instances, combined cells instead: Screenshot here

I also tried:

xargs -n2 < Table1.txt > Table2.txt

Each row in the output contains two merged cells: Screenshot 2

Edit 2- In response to @markp-fuso, I tried the command you listed but my output looks like this: Screenshot 3

Edit 3- Sorry for the screenshots. Here is the output from head -4 Table1.txt | od -c :

    0000000   c   h   r   1  \t   4   3   9   9   8   0   1  \t   4   4   0
    0000020   0   2   4   5  \t   p   e   a   k   _   1   2   6   5   9  \t
    0000040   7   1   9  \t   .  \t   3   2   .   3   7   6   7   5  \t   -
    0000060   1  \t   1   .   9   2   9   2   4  \t   2   2   2  \n  \t   1
    0000100  \t   4   4   4  \n   c   h   r   1  \t   2   4   9   5   5   4
    0000120   8  \t   2   4   9   5   9   9   2  \t   p   e   a   k   _   1
    0000140   1   9   7   0  \t   5   4   2  \t   .  \t   3   6   .   9   5
    0000160   4   4   3  \t   -   1  \t   2   .   5   8   3   7   2  \t   2
    0000200   2   2  \n  \t   1  \t   4   4   4  \n
    0000212

Edit 4- @ markp-fuso Here is the output from head -4 Table2.txt | od -c

0000000   c   h   r   1  \t   4   3   9   9   8   0   1  \t   4   4   0
0000020   0   2   4   5  \t   p   e   a   k   _   1   2   6   5   9  \t
0000040   7   1   9  \t   .  \t   3   2   .   3   7   6   7   5  \t   -
0000060   1  \t   1   .   9   2   9   2   4  \t   2   2   2  \t  \t  \t
0000100  \r  \n  \t   1  \t   4   4   4  \t   c   h   r   1  \t   2   4
0000120   9   5   5   4   8  \t   2   4   9   5   9   9   2  \t   p   e
0000140   a   k   _   1   1   9   7   0  \t   5   4   2  \t   .  \t   3
0000160   6   .   9   5   4   4   3  \t   -   1  \t   2   .   5   8   3
0000200   7   2  \t   2   2   2  \r  \n  \t   1  \t   4   4   4  \t  \t
0000220  \t  \t  \t  \t  \t  \t  \t  \t  \r  \n   c   h   r   1  \t   3
0000240   5   7   2   0   0   2  \t   3   5   7   2   2   6   4  \t   p
0000260   e   a   k   _   9   0   1  \t   1   0   0   0  \t   .  \t   1
0000300   4   8   .   6   2   2   9   2  \t   -   1  \t   3   .   9   4
0000320   0   9   6  \t   1   4   5  \t  \t  \t  \r  \n
0000334

Edit 5- The problem is mostly solved after fixing windows/dos line endings present in my Table1.txt.

Here is what I did:

dos2unix Table1.txt

awk 'BEGIN { FS=OFS="\t" } FNR%2==1 { a=$0 } FNR%2==0 { print a,$0 }' Table1.txt > Table2_b.txt

head -4 Table2_b.txt | od -c

0000000   c   h   r   1  \t   4   3   9   9   8   0   1  \t   4   4   0
0000020   0   2   4   5  \t   p   e   a   k   _   1   2   6   5   9  \t
0000040   7   1   9  \t   .  \t   3   2   .   3   7   6   7   5  \t   -
0000060   1  \t   1   .   9   2   9   2   4  \t   2   2   2  \t  \t   1
0000100  \t   4   4   4  \n   c   h   r   1  \t   2   4   9   5   5   4
0000120   8  \t   2   4   9   5   9   9   2  \t   p   e   a   k   _   1
0000140   1   9   7   0  \t   5   4   2  \t   .  \t   3   6   .   9   5
0000160   4   4   3  \t   -   1  \t   2   .   5   8   3   7   2  \t   2
0000200   2   2  \t  \t   1  \t   4   4   4  \n   c   h   r   1  \t   3
0000220   5   7   2   0   0   2  \t   3   5   7   2   2   6   4  \t   p
0000240   e   a   k   _   9   0   1  \t   1   0   0   0  \t   .  \t   1
0000260   4   8   .   6   2   2   9   2  \t   -   1  \t   3   .   9   4
0000300   0   9   6  \t   1   4   5  \t  \t   1  \t   2   6   2  \n   c
0000320   h   r   1  \t   9   5   8   4   0   0   3  \t   9   5   8   4
0000340   4   4   7  \t   p   e   a   k   _   1   0   9   0   8  \t   6
0000360   2   6  \t   .  \t   4   1   .   3   7   5   2   9  \t   -   1
0000400  \t   2   .   8   7   7   6  \t   2   2   2  \t  \t   1  \t   4
0000420   4   4  \n
0000423

The only issue now is an extra \t present (11th column).

head -4 Table2_b.txt
chr1    4399801 4400245 peak_12659  719 .   32.37675    -1  1.92924 222     1   444
chr1    2495548 2495992 peak_11970  542 .   36.95443    -1  2.58372 222     1   444
chr1    3572002 3572264 peak_901    1000    .   148.62292   -1  3.94096 145     1   262
chr1    9584003 9584447 peak_10908  626 .   41.37529    -1  2.8776  222     1   444

for troubleshooting your issues we need to see the actual contents of the files ... not what they data looks like when loaded into a spreadsheet; please update the question with the output from `head -4 Table1.txt | od -c` (cut-n-paste as text into a code formatted block - do not post as an image) — markp-fuso, Sep 29 '22 at 17:41
your **Edit 3** shows `Table1.txt` has unix line endinges (`\n`) while your **Edit 4** shows windows/dos line endings (`\r\n`) have been introduced into the mix; the only way (so far) I've been able to reproduce your output in **Edit 4** is if I modify `Table1.txt` to have windows/dos line endings (`\r\n`) and then run the code from my answer; net result ... your actual `Table1.txt` file does not match what you've posted in **Edit 3**; at this point I have to assume you're dealing with a couple different versions of `Table1.txt` (perhaps copying between unix/linux and windows/dos) ... — markp-fuso, Sep 30 '22 at 13:13
you'll need to either a) remove the windows/dos line endings from your actual data file (eg, `dos2unix Table1.txt` or b) modify your code to handle windows/dos line endings — markp-fuso, Sep 30 '22 at 13:15
You are right that I was moving between Linux and Windows. Using `dos2unix ` before running the code worked. However, there is an extra `\t` that appears in each row (at the 11th column). Thanks for all the help! I made an edit to make it clear to anyone who stumbles upon this question. — Geneious, Oct 03 '22 at 14:06

markp-fuso · Answer 1 · 2022-09-30T13:06:01.350

0

It's not clear (to me) what the input and output field delimiters are supposed to be so I'm just going to strip out all white space and use an output field delimiter of tab (\t):

awk '
BEGIN    { OFS="\t" }
FNR%2==1 { a=$0 }                     # save odd numbered line
FNR%2==0 { $1=$1; print a,$0 }        # strip excessive white space; print odd numbered line plus current (even numbered) line
' Table1.txt > Table2.txt

This generates:

$ cat Table2.txt
chr1    4399801 4400245 peak_12659      719     .       32.37675        -1      1.92924 222     1       444
chr1    2495548 2495992 peak_11970      542     .       36.95443        -1      2.58372 222     1       444
chr1    3572002 3572264 peak_901        1000    .       148.62292       -1      3.94096 145     1       262

edited Sep 30 '22 at 13:06

answered Sep 28 '22 at 20:51

markp-fuso

28,790
4
16
36

Thanks for the response. I tried it but the output still contains a lot of blanks and did not merge the rows appropriately. I'll attach a screenshot to the post above. – Geneious Sep 29 '22 at 16:33
@Geneious I've made a couple small changes (eg, updated my input file to have all columns separated by tabs) and my code still generates the same output; next question is what is in your `Table2.txt` file ... again, `head -4 Table2.txt | od -c` – markp-fuso Sep 29 '22 at 21:21
Thanks, I tried your updated code and edited my question with the output. – Geneious Sep 30 '22 at 12:40

How to remove empty cells in a row and combine every other row in tab delimited file

1 Answers1