46

I want to convert a tab-separated file into a CSV file. Can anyone help me?

Rodrigo de Azevedo
  • 1,097
  • 9
  • 17
user400480
  • 563
  • 2
  • 5
  • 5

10 Answers10

51

The answer for OSX is different.

MacOS does not understand \t in the sed expression.

You have to insert the tab literal into the sed search pattern by using ctrl+v then tab (see How can I insert a tab character with sed on OS X?)

sed 's/ /,/g' input_file > output_file
Community
  • 1
  • 1
ud3sh
  • 1,249
  • 10
  • 13
  • 6
    On OSX you can use GNU sed that understands `\t`. [This question](http://stackoverflow.com/questions/30003570/how-to-use-gnu-sed-on-mac-os-x) is about how to install it. – janosdivenyi Dec 01 '16 at 11:05
  • What do you mean by "different"? You haven't said what you're comparing with. – Toby Speight Feb 07 '19 at 09:55
49

You can use sed as:

sed 's/\t/,/g' input_file > output_file

This will keep the input file unchanged and will create a new file output_file with the changes.

If you want to change the input file itself without creating a new file you can use -i option to sed to do inplace changes:

sed -i 's/\t/,/g' input_file 
codaddict
  • 445,704
  • 82
  • 492
  • 529
  • 7
    This doesn't work in OS X, it seems to match the letter "t" and not a tab. – Mike Oct 21 '15 at 17:18
  • Instead of typing \t on OSX, press ctrl + v and then press tab – TheAppchemist Nov 13 '16 at 12:19
  • @codaddict, If I may ask, what is that `g` for in `'s/\t/,g'`? – Lesego M Feb 01 '18 at 09:41
  • 1
    @Les the `g` is to enable global substitution.... ref: https://www.tutorialspoint.com/unix/unix-regular-expressions.htm "It is very important to note that sed substitutes only the first occurrence on a line. If the string root occurs more than once on a line only the first match will be replaced. For the sed to perform a global substitution, add the letter g to the end of the command as follows" – greenhouse Mar 14 '18 at 14:58
  • OSX uses bsd style sed. If you use a blank file extension mod it will work. `sed -i '' 's/foo/goo/g' somefile-name` The blank is in single quotes after the -i. – netskink Apr 13 '19 at 16:59
  • How can we mark this as the correct answer? – Ouss Nov 18 '21 at 17:22
  • @Ouss - we shouldn't, since it fails as soon as any value contains a comma or a (quoted) tab. – Toby Speight Dec 16 '21 at 08:12
  • Aha… I sense an opportunity here – Ouss Dec 16 '21 at 10:34
  • Now I see you provided the PERL solution. That's definitely the way to go. No need to reinvent the wheel. I would vouch for your answer. At the same time I believe that there is a place for the regex answer as well... Maybe you should edit your answer to include this "basic" solution for the case the `input_file` was "clean". – Ouss Dec 16 '21 at 11:12
13

Bear in mind that there are many flavours of comma-separated-value file. Since you didn't specify one, I'll assume RFC-4180 format, in UTF-8 encoding, and the TSV to be the same but using tabs instead of commas.

The naive approach would be to simply replace every tab with a comma:

tr '\t' ,

This falls down if any of the values already contain a comma, or if any contain a quoted tab. You'll need to minimally parse the file, to maintain quoting. Instead of hand-rolling such a parser, it's simpler, clearer and more flexible to use one already written, such as Text::CSV for Perl:

#!/usr/bin/perl -w

use Text::CSV;

my $tsv = Text::CSV->new({ sep_char => "\t", auto_diag => 2 });
my $csv = Text::CSV->new();

while (my $row = $tsv->getline(*ARGV)) {
    $csv->print(STDOUT, $row) or die $csv->error_diag();
    print $/;
}
$csv->error_diag() unless $tsv->eof;
Toby Speight
  • 27,591
  • 48
  • 66
  • 103
3

This can also be achieved with Perl:

In order to pipe the results to a new output file you can use the following:
perl -wnlp -e 's/\t/,/g;' input_file.txt > output_file.csv

If you'd like to edit the file in place, you can invoke the -i option:
perl -wnlpi -e 's/\t/,/g;' input_file.txt

If by some chance you find that what you are dealing with is not actually tabs, but instead multiple spaces, you can use the following to replace each occurrence of two or more spaces with a comma:
perl -wnlpi -e 's/\s+/,/g;' input_file

Keep in mind that \s represents any whitespace character, including spaces, tabs or newlines and cannot be used in the replacement string.

Toby
  • 245
  • 1
  • 10
2

If the input_file did not contain commas nor quoted tabs, the easiest method is to use the sed command with a substitution regular expression that replaces tabs in the input_file with commas:

sed 's/\t/,/g' input_file > output_file

There is of course the case of input_files containing quoted tabs that will confuse the regex, also input_files that has values containing commas, these values should be quoted in output_files. Thee regex solution above would not be sufficient to address those cases.

One could think of several bash commands to "clean" the input file before processing it...

However, in the spirit of not reinventing the wheel. I would use a simple python3 script and store it in a file, let's call it: tsv_to_csv_converter.py:

import csv
import sys

tsv_file = sys.argv[1]
csv_file = sys.argv[2]

in_tsv = csv.reader(open(tsv_file, "r"), delimiter = '\t')
out_csv = csv.writer(open(csv_file, 'w'))

out_csv.writerows(in_tsv)

Of course, this code can be dramatically improved with exception catching, input value checks, a nice CLI interface, documentation.

The script can be used as follows:

python3 tsv_to_csv_converter.py input_file output_file
Ouss
  • 2,912
  • 2
  • 25
  • 45
1

After testing in MacOS, this is working for converting csv file to tsv file (suppose no tab or comma exists in the column values):

cat file_input.tsv | tr '\t' ',' > file_output.csv

I tried:

sed 's/ /,/g' input_file > output_file

and

sed 's/\t/,/g' input_file > output_file

However, neither of them worked.

Tree DR
  • 429
  • 4
  • 5
0

Is this the usual easy-question-all-languages thing? Okay, here is my haskell solution:

main = interact (unlines . replTab . lines) where
  replTab l = l       >>= (\line ->
    "\"" ++ line "\"" >>= \char ->
    case char of
      '\t' -> "\",\""
      '"'  -> "\"\""
      _    -> [char]
    )

not tested, but should work.

PS: All the other solutions aren't aware of escaping commas.

fuz
  • 88,405
  • 25
  • 200
  • 352
-1

In UNIX replace all tabs with commas in place:

sed -i -e 's/\t/,/g' filename
Lidong Guo
  • 2,817
  • 2
  • 19
  • 31
Peter
  • 127,331
  • 53
  • 180
  • 211
-1

sed -e 's/TAB_CHAR/,/g' data.tsv > data.csv

The tricky part is how to enter the TAB character in command line, to do that: Just hist "CTRL+V", then TAB

I frequently use this with mysql queries with -e parameter as described in this blog post http://blog.modsaid.com/2013/12/exporting-data-from-remote-mysql.html

Mahmoud Said
  • 197
  • 1
  • 9
-2

Try replacing all the tabs with commas.

Possibly with a regex like s/\t/,/g, if you don't have any quoted fields.

Or, you know, Excel could do that for ya. Or R. Or anything which can take in a TSV file.

Borealid
  • 95,191
  • 9
  • 106
  • 122