0

I have a shell script to take data from a table and put it in a .csv file. The data I get is tab delimited and I used sed 's/\t/,/g'. This works but the data I have has occasional commas in the value as well, say "abc,def".

Now when I open the .csv file this value is split and moved to next column considering the comma, now "abc" is in one column and "def" in another.

How can I handle this and have values with commas displayed as is? Or can I convert a comma to a pipe symbol (|) and make the .csv identify the pipe symbol as delimiter?

Alfe
  • 56,346
  • 20
  • 107
  • 159
shru
  • 1
  • 1
  • 1
  • That's up to the program you are using to open the csv with. – James Brown May 08 '18 at 10:45
  • Sorry i didn't understand your answer ,I am new to this , did you mean in which editor csv is opening? currently its opening in Excel. – shru May 08 '18 at 10:47
  • 2
    How about keeping the tabs you have originally? Most CSV reading programs (like spreadsheet programs, etc.) can be configured to accept tab symbols as delimiters (as well as commas, semicolons, maybe even pipe symbols). In other words: *Why* do you want to replace the tabs by something else? – Alfe May 08 '18 at 11:00
  • Thanks for replying , it will be helpful if you can help me with how to make csv reading program identify tab as delimiter , for now it was not identifying tab as delimiter and hence all data was put in single column. – shru May 08 '18 at 11:05

3 Answers3

0

Better use sed 's/\t/|/g' and make the .csv identify the pipe symbol as delimiter. That will fix your problem.

Second option:-

Make , to ; first and replace all the tab \t to coma ,. Check with your client whether they are happy with this proposal.

Abhijit Pritam Dutta
  • 5,521
  • 2
  • 11
  • 17
  • Isn't that what OP proposed? – Alfe May 08 '18 at 11:02
  • 1
    @Alfe Not exactly I am advising him to replace tab with pipe. He is asking replacement of `,` coma with pipe and that is create more issue as some of his values contain text with coma. – Abhijit Pritam Dutta May 08 '18 at 11:05
  • Right, not exactly. But he already proposed to use pipe symbols, and he already proposed the `sed` script. Well … – Alfe May 08 '18 at 11:07
  • Can i make the .csv file identify pipe symbol as delimiter in shell script itself. because once the .csv file is created it is uploaded to a server and there is no chance of changing anything manually. – shru May 08 '18 at 11:07
  • The .csv file will only *contain* data using a certain delimiter. It is up to the user (reading program) to interpret that delimiter properly. The .csv file itself does not *say* (identify?) what its delimiter is. – Alfe May 08 '18 at 11:09
  • Yes Alfe is true. You have to inform the client to make the necessary changes in excel while reading that file. Else check with client whether they are happy if you make `,` to `;` first and replace all the tab to coma. – Abhijit Pritam Dutta May 08 '18 at 11:10
0

If your intended target program is only Excel, then putting double quotes around the value will work "abc, def". Same has been discussed here.

"abc,def",ghi,ijk,lmn # this works with spreadsheet

I have tried the sed command with dummy data and it works. Hope this works with your real data too.

jmaster:~/hackerrank$ cat -T test # below ^I are nothing but tab space
abc,def^Ighi^Ijkl^Imnm
abc,def^Ighi^Ijkl^Imnm
abc,def^Ighi^Ijkl^Imnm

jmaster@jenkins-master:~sed -e 's/\([^\t]*,[^\t]*\)/"&"/g' -e 's/\t/,/g' test
"abc,def",ghi,jkl,mnm
"abc,def",ghi,jkl,mnm
"abc,def",ghi,jkl,mnm
Fidel
  • 977
  • 1
  • 6
  • 13
0

Thanks all for replying. I found that using "sep=|' in the first line of csv file will tell excel to consider |(pipeline)as separator. I included code in my shell script to insert first line as "sep= "(tab in my case). Now when i open .csv in excel it is taking tab as separator and distributing values correctly in respective columns.

shru
  • 1
  • 1
  • 1