1

I have the following csv file:

ID                Name          Gender        NID
Y12               Jim              M          C12
Y23               David            M          C23
234               Bob              M          C22
2b1               Lucy             F          C24
2bb               Lily             F          C25

What I want to do is to replace ID colunm's value with NID value if the first letter in ID is not a alphabet letter. So the output is expected to be

ID                Name          Gender        NID
Y12               Jim              M          C12
Y23               David            M          C23
C22               Bob              M          C22
C24               Lucy             F          C24
C25               Lily             F          C25

This is just a sample. My original data size is much bigger so I have to use awk. The key issue is how to specify the condition. I am not clear how to deal with. The code I was thinking of is like awk -F, 'if $1 != ..., $1=$4'. I also googled but failed to find an appropriate solution. Here is a link which I think may be helpful. Filter lines that have only alphabets in first column. Thank you.

Samson
  • 75
  • 7
  • Your input isn't csv, did you copy the wrong text? – Graeme Sep 27 '19 at 22:05
  • It's just an example. My real data is very large so I cannot post it here. I guess if it's csv file, separated by ",", then we only need to write ``` awk -F,``` , which can help ```awk``` to recognize the csv file. – Samson Sep 27 '19 at 22:24
  • 1
    You can, in this case. But you are better to post a more representative sample of your data and output as it may change the answers people give. – Graeme Sep 27 '19 at 22:32
  • Okay. I will first apply those posted solutions into my original data to see whether they do work. If not, then I will post a sample from my data. – Samson Sep 27 '19 at 22:38

2 Answers2

1

Use a regular expression.

awk '$1 ~ /^[^A-Z]/ { $1 = $4 }1' filename

DEMO

Barmar
  • 741,623
  • 53
  • 500
  • 612
  • @Samson Did you leave out the `1` at the end? – Shawn Sep 27 '19 at 22:13
  • @Samson, I found the Grymoire tutorials very useful when I was learing - http://www.grymoire.com/Unix/Awk.html – Graeme Sep 27 '19 at 22:41
  • @Barmar Hi, there is another sub-issue. I found that such code could not help me to distinguish whether the first letter is uppercase or not. As long as the first letter is alphabet, then ID will be replaced by NID. What if I only want to change ID with uppercase? – Samson Sep 27 '19 at 22:45
  • My code is case-sensitive, it only keeps the ID if it begins with an uppercase letter. – Barmar Sep 27 '19 at 22:51
  • Anyway, change `A-Z` to `A-Za-z` if you want to treat uppercase and lowercase letters similarly. – Barmar Sep 27 '19 at 22:53
  • @Samson to learn AWK get the book "Effective AWK Programming", 4th Edition, by Arnold Robbins. Don't get any other book as the rest are outdated and/or worse in other ways. To test for upper case use the [Character Class inside a Bracket Expression](https://www.gnu.org/software/grep/manual/html_node/Character-Classes-and-Bracket-Expressions.html) `[[:upper:]]` rather than `[A-Z]` (the latter won't work in every locale, the former will), for lower `[[:lower:]]` and for either `[[:alpha:]]` – Ed Morton Sep 28 '19 at 00:44
  • For case sensitive letters you can use awk '$1 ~ /^[^[Aa]-[Zz]]/ { $1 = $4 }1' filename – Ravi Saroch Oct 03 '19 at 06:25
0

If you can count on the ID column always being 3 characters, this will preserve the table formatting:

awk '{ sub("^[^[:alpha:]]..", $4) } 1' file
Graeme
  • 2,971
  • 21
  • 26