1

I have to store data containing tabulations in a file. I would like to use .TSV files (Tab-Separated File).

Here is an example of data (I manually escaped tabs and carriage return for the example):

                       Computation                   Display
0  for (int i=0;i<10;i++)\n\tx*=3;  printf ("<b>éàè'"</b>");
1                 float pi=3.1415;     printf("%d %f",x,xf);

Is there a proper way to escape tabs? Should I use \t, should I use quotes or double quotes?

TylerH
  • 20,799
  • 66
  • 75
  • 101
Mr Robot
  • 1,037
  • 9
  • 17
  • 1
    Your question is very unclear, please try to improve it. – Mark Setchell Dec 24 '17 at 10:24
  • 2
    @Mark What is unclear with my question ? Tabs are delimiters, so what if there is tabs in the data ? – Mr Robot Dec 24 '17 at 10:41
  • TSVs and CSVs can have whatever you want in them. There are no laws. The more unusual/extreme the contents, the less compatible the file will be. They are your files, use them how you wish - just ensure that whatever you create is readable by whatever tools you wish to use. – Mark Setchell Dec 24 '17 at 11:19
  • I think the phrasing of question is little ambiguous, in particular, "data containing tabulations". This could refer either to TAB characters in the data, or to "tabular data" generally, such a table of numbers. However, the title seems relatively clear about intending TAB characters in the data. Perhaps a data snippet containing tab characters would clear things up. – JonDeg Dec 25 '17 at 04:52
  • FWIW: Seems clear to me with the new edits. Question is if TAB characters (and newlines) can be included in properly formatted TSV files, and if so, what the syntax is. (Note: I don't have enough reputation to vote on hold status.) – JonDeg Dec 27 '17 at 03:04
  • No, tabs are not allowed in TSV. "Note that fields that contain tabs are not allowable in this encoding." -https://www.iana.org/assignments/media-types/text/tab-separated-values – flow2k May 10 '19 at 00:32
  • Duplicate of https://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file – TylerH Sep 01 '21 at 14:05

1 Answers1

2

The abbreviation CSV means "Comma Separated Values", but in practice, this abbreviation is used for all files containing values that are separated by some separator-character. That's why spreadsheet applications like Open Office Calc or Microsoft Excel open up a dialog window letting you configure the separator and quoting character when you attempt to open a file with the file-extension .csv.

If your question is how the separator-character can be part of a value of a CSV file, the most common way is quoting the values. Here is in example of the quoting being done with the values

a,b
c"d
     e    

with , as the separator character and " as the quoting character

"a,b","c""d",   e   

The second way of quoting is the way Excel does it, you can also see variants where the quoting is done in the same way as the first example.

There are libraries out there that do the parsing and creation of CSV files for you. We "here" use the Ostermiller CSV library (there might be better ones nowerday but it does its job so there was no need to change the library after we introduced it "here" 10 years ago.

TylerH
  • 20,799
  • 66
  • 75
  • 101
Lothar
  • 5,323
  • 1
  • 11
  • 27
  • @Fifi So your question is how values of a CSV using some separator can contain said separator as part of the value?` – Lothar Dec 24 '17 at 09:47
  • As you ask me, I realized the answer is [here](https://stackoverflow.com/questions/769621/dealing-with-commas-in-a-csv-file). My question is more about tsv files and more generally best practice. – Mr Robot Dec 24 '17 at 09:52
  • Thank you for the detailed answer. I got it for CSV files and I finally edited my question to focus on TSV files. Maybe, you should update or remove your answer to avoid down-voting. – Mr Robot Dec 24 '17 at 10:01
  • @Fifi You don't need to worry about votes to my answer ;-) As already mentioned as comment in your question, it's completely irrelevant if you call a file CSV, TSV, SSV. You can see it as hint what the separator is but in general "out there" you always talk about CSV-files independent of the actual separator being used. – Lothar Dec 24 '17 at 20:36
  • 2
    Note that TSV and CSV are distinct formats. CSV format uses an escape syntax to represent field and record delimiters in the data (typically comma and newline). TSV format does not use escapes, and instead disallows field and record delimiters in the data. Typical delimiters are TAB and newline, but alternate delimiter characters can be used. – JonDeg Dec 25 '17 at 05:05
  • @JonDeg, that's really interesting, do you have source ? – Mr Robot Dec 26 '17 at 06:16
  • 1
    @Fifi See the [wikipedia tab-separated-values article](https://en.wikipedia.org/wiki/Tab-separated_values). Third paragraph in particular, where it describes this property as the IANA standard. But it's not just a standard body group recommendation. It is observed by many software packages and data file producers. And it makes sense. Many data sets are largely numeric, and don't require TAB characters in the data. Avoiding escapes sequences is a very useful simplification. – JonDeg Dec 26 '17 at 08:29
  • Interesting, it is not mentioned in the french Wikipedia page : https://fr.wikipedia.org/wiki/Tabulation-separated_values. Unfortunately, it will be one more hurdle to overcome... See my example in the updated question. – Mr Robot Dec 26 '17 at 13:51
  • The updated question is about TSV, not CSV, so I think this Answer needs to be updated to be relevant. – flow2k May 10 '19 at 00:29