Question says it all, I've got a 500,000 line file that gets generated as part of an automated build process on a Windows box and it's riddled with ^M's. When it goes out the door it needs to *nix friendly, what's the best approach here, is there a handy snippet of code that could do this for me? Or do I need to write a little C# or Java app?
7 Answers
Here is a Perl one-liner, taken from http://www.technocage.com/~caskey/dos2unix/
#!/usr/bin/perl -pi
s/\r\n/\n/;
You can run it as follows:
perl dos2unix.pl < file.dos > file.unix
Or, you can run it also in this way (the conversion is done in-place):
perl -pi dos2unix.pl file.dos
And here is my (naive) C version:
#include <stdio.h>
int main(void)
{
int c;
while( (c = fgetc(stdin)) != EOF )
if(c != '\r')
fputc(c, stdout);
return 0;
}
You should run it with input and output redirection:
dos2unix.exe < file.dos > file.unix

- 46,145
- 29
- 109
- 133
-
Don't worry about performance until you must deal with terabytes :D The C version takes ~ 5 seconds to convert a 65 MB file with 500000 lines of text (on an old Pentium4 with a standard EIDE disk) – Federico A. Ramponi Nov 24 '08 at 01:26
-
@Federico, that (naive) C version will remove all CR characters, not just those in a CR-LF pair. But I guess that's why you called it naive. :-) – paxdiablo Nov 24 '08 at 03:48
If installing a base cygwin is too heavy, there are a number of standalone dos2unix
and unix2dos
Windows standalone console-based programs on the net, many with C/C++ source available. If I'm understanding the requirement correctly, either of these solutions would fit nicely into an automated build script.

- 13,277
- 2
- 41
- 49
If you're on Windows and need something run in a batch script, you can compile a simple C program to do the trick.
#include <stdio.h>
int main() {
while(1) {
int c = fgetc(stdin);
if(c == EOF)
break;
if(c == '\r')
continue;
fputc(c, stdout);
}
return 0;
}
Usage:
myprogram.exe < input > output
Editing in-place would be a bit more difficult. Besides, you may want to keep backups of the originals for some reason (in case you accidentally strip a binary file, for example).
That version removes all CR characters; if you only want to remove the ones that are in a CR-LF pair, you can use (this is the classic one-character-back method :-):
/* XXX Contains a bug -- see comments XXX */
#include <stdio.h>
int main() {
int lastc = EOF;
int c;
while ((c = fgetc(stdin)) != EOF) {
if ((lastc != '\r') || (c != '\n')) {
fputc (lastc, stdout);
}
lastc = c;
}
fputc (lastc, stdout);
return 0;
}
You can edit the file in-place using mode "r+". Below is a general myd2u program, which accepts file names as arguments. NOTE: This program uses ftruncate to chop off extra characters at the end. If there's any better (standard) way to do this, please edit or comment. Thanks!
#include <stdio.h>
int main(int argc, char **argv) {
FILE *file;
if(argc < 2) {
fprintf(stderr, "Usage: myd2u <files>\n");
return 1;
}
file = fopen(argv[1], "rb+");
if(!file) {
perror("");
return 2;
}
long readPos = 0, writePos = 0;
int lastC = EOF;
while(1) {
fseek(file, readPos, SEEK_SET);
int c = fgetc(file);
readPos = ftell(file); /* For good measure. */
if(c == EOF)
break;
if(c == '\n' && lastC == '\r') {
/* Move back so we override the \r with the \n. */
--writePos;
}
fseek(file, writePos, SEEK_SET);
fputc(c, file);
writePos = ftell(file);
lastC = c;
}
ftruncate(fileno(file), writePos); /* Not in C89/C99/ANSI! */
fclose(file);
/* 'cus I'm too lazy to make a loop. */
if(argc > 2)
main(argc - 1, argv - 1);
return 0;
}

- 88,763
- 26
- 134
- 176
-
@strager, fixed to use ints (required for EOF) and added code to do CRs only in a CR-LF pair - hopefully this'll get you more rep. Oh yes, and upvoted. – paxdiablo Nov 24 '08 at 02:57
-
I noticed the correction using int; thanks! I'll leave the second one alone, even if it isn't my style. =] – strager Nov 24 '08 at 03:00
-
The second snippet fails on the empty file, although it's fairly trivial to fix that. – Adam Rosenfield Nov 24 '08 at 04:37
tr -d '^M' < infile > outfile
You will type ^M as : ctrl+V , Enter
Edit: You can use '\r' instead of manually entering a carriage return, [thanks to @strager]
tr -d '\r' < infile > outfile
Edit 2: 'tr' is a unix utility, you can download a native windows version from http://unxutils.sourceforge.net[thanks to @Rob Kennedy] or use cygwin's unix emulation.

- 4,089
- 2
- 27
- 30
Ftp it from the dos box, to the unix box, as an ascii file, instead of a binary file. Ftp will strip the crlf, and insert a lf. Transfer it back to the dos box as a binary file, and the lf will be retained.

- 28,120
- 21
- 85
- 141
-
I'm not such a fan of this one, seems like it would be a PITA as part of an automated build. Plus, if I don't have a local unix box on the network, I've either got to buy one, or transfer the file over the WAN, twice. Must be possible to do this locally, no? – ninesided Nov 24 '08 at 01:01
-
Neither am I. It requires at least one running FTP server, which is a little overkill for a file conversion. – Federico A. Ramponi Nov 24 '08 at 01:03
-
-
FTP in ascii mode can also translate between tabs and spaces, depending on the implementation, which would be undesirable. – paxdiablo Nov 24 '08 at 03:51
Some text editors, such as UltraEdit/UEStudio have this functionality built-in.
File > Conversions > DOS to UNIX

- 537,072
- 198
- 649
- 721
-
gVim can also do this, loading it automatically in DOS mode, then type ":set filemode=unix" without the quotes (from memory) and saving. – paxdiablo Nov 24 '08 at 03:52
-
-
ah, true. UEStudio does actually have a rather good scripting and macro system built in, which would actually let you do this via the command line, but you're right, it's not the best tool for an automated process. – nickf Nov 24 '08 at 04:31
-
`not useful for an automated process though` = incorrect. both ultraedit/uestudio can run macros from command line on files. It has a very powerful scripting engine that is basically javascript with a few more powerful methods available. http://www.ultraedit.com/support/tutorials_power_tips/ultraedit/run_macro_script_from_command_line.html – Anthony Hatzopoulos Jul 26 '12 at 19:46
If it is just one file I use notepad++. Nice because it is free. I have cygwin installed and use a one liner script I wrote for multiple files. If your interest in the script leave a comment. (I don't have it available to me a this moment.)

- 198
- 1
- 9
- 22