1

I have a csv file with following content:

G11123,IT,AMIT,INDIA
G11124,IT,HEMANT,INDIA
.......
.......

I am having around 7000 rows.

I need the another csv file with content:

G11,AMIT,INDIA
G11,HEMANT,INDIA
.........
.......

When I am trying the below command

awk 'BEGIN{FS=OFS=","} {print substr($1,1,3),$3,$4}' file.csv > temp.csv

When I am checking the temp.csv, First row I am not getting as per expectation while In all other rows proper substring is visible:

,AMIT,INDIA
G11,HEMANT,INDIA
...............
...............

Is there something wrong with the above command ?

Regards,

Manish
  • 1,274
  • 3
  • 22
  • 59
  • 2
    What is output of `head file.csv | cat -A` and why do you have `;` instead of `,` as `FS=OFS` ? – anubhava Jun 24 '21 at 15:02
  • @anubhava that's a typo...corrected that – Manish Jun 24 '21 at 15:42
  • 3
    As @anubhava requested - What is output of `head file.csv | cat -A`? We ask because if you have DOS line endings then that might explain the problem you're having (see [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it)) and `cat -A` would show those. – Ed Morton Jun 24 '21 at 16:06
  • @anubhava while running the requested command it says cat: illegal option -- A – Manish Jun 25 '21 at 04:43
  • ok try: `head file.csv | cat -vte` – anubhava Jun 25 '21 at 06:14

2 Answers2

1

You can use a comma as field separator and use OFS to join the values:

awk 'BEGIN{FS=OFS=","} {print substr($1,1,3) OFS $3 OFS $4}' file.csv > temp.csv

See the online demo:

s='G11123,IT,AMIT,INDIA
G11124,IT,HEMANT,INDIA'
awk 'BEGIN{FS=OFS=","} {print substr($1,1,3) OFS $3 OFS $4}' <<< "$s"

Output:

G11,AMIT,INDIA
G11,HEMANT,INDIA
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

With your shown samples please try following awk code. This code will even work if you have more than 4 fields in your Input_file(since its not hard coding how many fields should be printed).

awk 'BEGIN{FS=OFS=","} {len=length($1);sub(".{"len-3"}$","",$1);$2="";sub(/,/,"")} 1'  Input_file

Explanation: Adding detailed explanation for above.

awk '                        ##Starting awk program from here.
BEGIN{                       ##Starting BEGIN section of this program from here.
  FS=OFS=","                 ##Setting FS and OFS to comma here.
}
{
  len=length($1)             ##Take length of 1st field here.
  sub(".{"len-3"}$","",$1)   ##Substituting everything apart from 1st 3 characters in 1st field with NULL.
  $2=""                      ##Nullifying 2nd field here.
  sub(/,/,"")                ##Substituting comma with NULL here.
}
1                            ##Printing current line here.
'  Input_file                ##Mentioning Input_file name here.

EDIT: To delete control M characters and print lines one could try following.

awk 'BEGIN{FS=OFS=","} {sub(/\r$/,"");len=length($1);sub(".{"len-3"}$","",$1);$2="";sub(/,/,"")} 1'  Input_file
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • it's only giving one column,first one and the problem is that first row is still blank.Same that I am facing with command that I have been using. – Manish Jun 25 '21 at 04:56
  • 1
    @Manish, ok could you please run a command `cat -v file` to your Input_file and see if you are seeing control M characters in it? Looks like you have control M chars in your file, kindly do check once and let me know how it goes? – RavinderSingh13 Jun 25 '21 at 04:57
  • 1
    Yes it's there. – Manish Jun 25 '21 at 05:03
  • @Manish, ok then please try following command `awk 'BEGIN{FS=OFS=","} {sub(/\r$/,"");len=length($1);sub(".{"len-3"}$","",$1);$2="";sub(/,/,"")} 1' Input_file` once and let me know how it goes, this should clear them off at last of line and then should give you correct output. Let me know how it goes, cheers. – RavinderSingh13 Jun 25 '21 at 05:05
  • Ravinder I think awk command I have been using has nothing wrong in it. That one is quite easy to understand rather than this. I have to look some way to remove control M characters. – Manish Jun 25 '21 at 05:08
  • @Manish, if you want to remove control M chars totally from file. use this command then `tr -d '\r' < Input_file > temp && mv temp Input_file` it will remove all control M chars. Then run my answer's code. – RavinderSingh13 Jun 25 '21 at 05:09
  • something is wrong with the command you shared. Anyways i have removed those characters using dos2unix command and now the command I had been using, is working. Thanks for your help. – Manish Jun 25 '21 at 05:15
  • @Manish, Please do let me know what is not working? If you give complete details then only I could help more. – RavinderSingh13 Jun 25 '21 at 05:16
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/234183/discussion-between-manish-and-ravindersingh13). – Manish Jun 25 '21 at 05:21
  • @Manish, Hey Manish, sorry I can't join chat room, please comment here, later we can delete comments once its fixed. – RavinderSingh13 Jun 25 '21 at 06:04
  • 1
    just add control M character in your answer and I'll mark your answer accepted. It would be also helpful for others that command mentioned in the answer has nothing wrong in it. – Manish Jun 25 '21 at 09:00
  • @Manish, Sure, I have added EDIT solution now. Cheers. – RavinderSingh13 Jun 25 '21 at 09:03