How to replace "00" in data with "N/A" skipping first row in sed?

Question

I'm working with GWAS data, My data looks like this:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,00,AG,GT,AK,00
32,AG,GG,AA,00,AT
300,TT,AA,00,AG,AA       
400,GG,AG,00,GT,GG

Desired Output:

IID,kgp11004425,rs11274005,kgp183005,rs746410036,kgp7979600
1,N/A,AG,GT,AK,N/A
32,AG,GG,AA,N/A,AT
98,TT,AA,N/A,AG,AA       
3,GG,AG,N/A,GT,GG

Here I'm trying to replace "00" with "N/A", but since I have 00 in the first_row/header_row and First column i.e IId, it's replacing here with N/A like kgp11N/A4425, rs11274N/A5,kgp183N/A5.... and Id column values with 300, 400, 500 as 3N/A, 4N/A, 5N/A. The bash command I used:

sed 's~00~N/A~g' allSNIPsFinaldata.csv

Can anyone please help "how not to include/Skip the first row or header row and first column and apply this effect. please help

Add output of `file allSNIPsFinaldata.csv` to your question (no comment). — Cyrus, Apr 21 '22 at 05:50

score 2 · Answer 1 · answered Apr 21 '22 at 05:52

With 2 capture groups you can use this sed:

sed -E 's~(^|[[:blank:]])00([[:blank:]]|$)~\1N/A\2~g' file

IID, kgp11004425, rs11274005, kgp183005, rs746410036, kgp7979600
1       N/A           AG        GT            AK          N/A
32      AG           GG        AA            N/A          AT
98      TT           AA        N/A            AG          AA
3       GG           AG        N/A            GT          GG

Details:

(^|[[:blank:]]): Match start or a whitespace in capture group #1
00: Match 00
([[:blank:]]|$): Match end or a whitespace in capture group #2
\1N/A\2: Replacement to put back value of capture group #1 followed by N/A followed by value of capture group #2

I have just given sample data, like the above data I have 522 rows and 2583 columns. Will, it still be able to work? — , Apr 21 '22 at 06:01

score 1 · Accepted Answer · answered Apr 21 '22 at 06:04

1

You may specify an address to select the line(s) to apply the command to. Thus you might choose to exclude the first line like this:

sed '1!s~00~N/A~g' allSNIPsFinaldata.csv

As a sidenote I'd like to note that your example isn't actually CSV despite the file name; your header is comma-delimited but the rest of the file is using spaces.

answered Apr 21 '22 at 06:04

Klaus Klein

81
3

3

Just a note that it will replace `100` with `1N/A` – anubhava Apr 21 '22 at 11:03
@anubhava, the above command, replaced 100, 200, 300, 400 values as 2NA, 3NA.. in the IID column which is the problem here, how can I skip First Row and First Column as well and perform the above operation – Apr 29 '22 at 07:07
@Klaus Klein, the above command, replaced 100, 200, 300, 400 as 1NA, 2NA,... in the IID column which is the problem here, how can I skip First Row and First Column as well and perform the above operation – Apr 29 '22 at 07:10
@RajuNatha: [My answer](https://stackoverflow.com/a/71949532/548225) already takes care of that but you overlooked that. – anubhava Apr 29 '22 at 07:52

The fourth bird · Answer 3 · 2022-04-21T08:08:55.190

0

You might also skip the first row starting from the second one:

sed '2,$s~00~N/A~g' allSNIPsFinaldata.csv

If you don't want partial word matches, you can implement word boundaries around the 00 in different ways.

edited Apr 21 '22 at 08:08

answered Apr 21 '22 at 06:03

The fourth bird

154,723
16
55
70

How to replace "00" in data with "N/A" skipping first row in sed?

3 Answers3