0

I want to check for a pattern (only if the pattern starts with) in second column in a CSV file and if that pattern exists then replace something else in same line.

I wrote the following sed command for following csv to change the I to N if the pattern 676 exists in second column. But it checks 676 in the 7th and 9th column also since the ,676 exists. Ideally, I want only the second line to be checked for if the prefix 676 exists. All I want is to check 676 prefixed in second column (pattern not in the middle or end of the second value Ex- 46769777) and then do the change on ,I, to ,N,.

sed -i  '/,676/ {; s/,I,/,N,/;}' temp.csc

6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,I,TTTT,I,67677,yy 
6768880,46769777,S,I,TTTT,I,67677,yy  

Expected result required

6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy
6768880,40999777,S,I,TTTT,I,67677,yy  
shellter
  • 36,525
  • 7
  • 83
  • 90
virajatt
  • 73
  • 1
  • 11
  • difficult to write supportable code in `sed` that can do this. Do you really care if it is `sed`? `awk` is designed with these sort of problems in mind and will be very easy to implement. Good luck. – shellter Apr 09 '15 at 02:29
  • You have an error in your output. How does `46769777` become `40999777` – Jotne Apr 09 '15 at 05:22

2 Answers2

2

This requires that 676 appear at the beginning of the second column before any changes are made:

$ sed   '/^[^,]*,676/ s/,I,/,N,/g' file
6768880,55999777,S,I,TTTT,I,67677,yy
6768880,676999777,S,N,TTTT,N,67677,yy 
6768880,46769777,S,I,TTTT,I,67677,yy  

Notes:

  • The regex /^[^,]*,676/ requires that 676 appear after the first appearance of a comma on the line. In more detail:

    • ^ matches the beginning of the line

    • [^,]* matches the first column

    • ,676 matches the first comma followed by 676

  • In your desired output, ,I, was replaced with ,N, every time it appeared on the line. To accomplish this, g (meaning global) was added to the substitute command.

John1024
  • 109,961
  • 14
  • 137
  • 171
2

If you are not bound by sed, awk might be a better option for you. Give this a try :

awk -F"," '{match($2,/^676/)&&gsub(",I",",N")}{print}' temp.csc

match syntax does the matching of second column to numbers that starts with (^) 676. gsub replaces I with N.

Result: 6768880,55999777,S,I,TTTT,I,67677,yy 6768880,676999777,S,N,TTTT,N,67677,yy 6768880,46769777,S,I,TTTT,I,67677,yy

iamauser
  • 11,119
  • 5
  • 34
  • 52