1

I am trying to batch convert quite a number of CSV files present encoding to UTF-8 through .NET

What I have been doing till now is opening the csv file one by one and selecting "all files" from the "save as/format type" dropdown box and selecting the encoding as "UTF-8" again from the dropdown box below it and then I save it (It doesn't asks to replace the original file though).

As this procedure is quite tedious, I would like to write a tiny app for it in vb.NET

All I came up with is this: System.Text.Encoding.Convert(System.Text.Encoding.ASCII,System.Text.Encoding.UTF-8)

But thats creating an error :(

Any suggestions? Thx

UPDATE: Just updated my question to use .NET's internal lib/funcs instead of using Notepad :D

gunther
  • 13
  • 5
  • 2
    Suggestion: Skip Notepad, use encoding conversion functions available in .NET. – deceze Apr 25 '12 at 05:32
  • @deceze but won't that be a little bit of experimentation? (I read somewhere on the Internet that .NET sometimes is not able to recognize the correct set of encoding/BOM unless a 3rd party lib is used like iconv) Just to be on safer side I want to stick with notepad :D – gunther Apr 25 '12 at 05:35
  • 1
    I wouldn't automate notepad with .net. Either do the encoding thing entirely in .NET (as per deceze) or maybe you could look at automating with AutoHotkey instead. It will let you record a macro of keyboard and mouse clicks, then replay it. – GregHNZ Apr 25 '12 at 06:03
  • @GregHNZ Thx for replying. Looks like .NET is the way to go :) But where to begin? :( I am kinda noob here especialy when it comes to file handling. Also How can I open a file at a time and loop through the available files in a directory? – gunther Apr 25 '12 at 06:08

3 Answers3

0

Take a look at DirectoryInfo for enumerating files in a directory.

Then look at File.ReadAllText() and File.WriteAllText() which are convenience methods you can easily use to convert encodings.

Note that if you want UTF-8 without a signature at the start of the file (U+FEFF) you need to create your encoding with

var encoding = new UTF8Encoding(false);
Joey
  • 344,408
  • 85
  • 689
  • 683
  • Thx for replying :) Well I certainly have no idea about the signature at the start of the CSV file but I did read about BOM a bit. Is that what you are trying to put up here? :| Also, Will there be any difference practically if compared to manually converted file using Notepad and the one using .NET's procedure using "System.Text.Encoding" ? Such as line/char spacing, new line etc? – gunther Apr 25 '12 at 06:25
  • The signature is indeed the BOM. – RvdK Apr 25 '12 at 06:42
  • PoweRoy, yes, but with UTF-8 being a endian-agnostic encoding I'm hesitant to call it BOM in that case ;-) – Joey Apr 25 '12 at 07:24
0

If this is a one shot, fire up PowerShell:

gci *.csv | %{ Get-Content $_ | Set-Content -Encoding UTF8 "$($_.BaseName)_Encoded.csv" }

gci *.csv : get all csv files in the current dir and pipe the result to a "foreach" loop (%) Get-Content of each file then pipe the result to Set-Content that does the UTF8 conversion and store the result in a file having the same base name, postfixed with "_Encoded".

David Brabant
  • 41,623
  • 16
  • 83
  • 111
  • Thx for the shot :D but Windows only environment no *nix (I wished if it was *nix as there are nearly zillions of articles/posts on how to do it in *nix :) but sadly not in windows) – gunther Apr 25 '12 at 06:33
  • Awesome! but will this convert all the files in the present directory? I wanted to do a batch conversion. How can I make it work in a loop for all the files present in the current directory? – gunther Apr 25 '12 at 06:44
  • Edited the post above to give some explanations. Yes, this does conversion for all files in current dir. – David Brabant Apr 25 '12 at 06:48
  • Ok, I figured it out. gci == get child items :) But still how can i do it through .NET, it would help as I would also learn a bit of vb.NET programming. :D UPDATE: Nice explanation above(I need 15 reps at min. to mark as very very useful :( ) ^^ – gunther Apr 25 '12 at 06:51
  • Marking clickstefan's post as answer. Sry but his answer was more related to .NET. Thanks again for all the help :) Cheers, Gunther – gunther Apr 25 '12 at 07:47
0

Try this: Mozilla's charset detector or a .NET port of it.
OR
Here you can find other ways people have done it.

EDIT: OR adapt/use this

using System; 
using System.Data; 
using System.IO; 
using System.Text; 


public partial class Converting : System.Web.UI.Page

{ 
    protected void Page_Load(object sender, EventArgs e)

    { 


        string sourceDir = "C:\\test";

        string newDir = "C:\\test2";

        foreach (String sourceFile in System.IO.Directory.GetFiles(sourceDir))

        { 
            char[] splitter = { '\\' };



            String[] str = sourceFile.Split(splitter); 
            String fname = str[str.Length - 1]; 


            FileStream fs = new FileStream(sourceFile, FileMode.Open, FileAccess.ReadWrite);

            StreamReader ReadFile = new StreamReader(fs, System.Text.Encoding.ASCII);

            FileStream fs1 = new FileStream(newDir + 
"\\new_" + fname, FileMode.OpenOrCreate, FileAccess.Write); 
            StreamWriter WriteFile = new StreamWriter(fs1, System.Text.Encoding.UTF8);

            String strLine; 
            while (ReadFile != null)

            { 
                strLine = ReadFile.ReadLine(); 
                //MessageBox.Show(strLine); 
                if (strLine != null) 
                { 
                    WriteFile.WriteLine(strLine); 
                } 
                else 
                { 
                    ReadFile.Close(); 
                    ReadFile = null; 
                    WriteFile.Close(); 
                } 
            } 
        } 
    } 
}
Community
  • 1
  • 1
Stefan Rogin
  • 1,499
  • 3
  • 25
  • 41
  • Thx but apart from being a noob I would go with simple ways that just gets the work done (no harm in that ..right? :D ). – gunther Apr 25 '12 at 06:47
  • btw thx for providing me the link of how other people have done the same in different ways; useful :) – gunther Apr 25 '12 at 06:58
  • You're welcomed, also found an example for ASP.NET that should be somewhat similar : http://forums.asp.net/t/1173381.aspx/1 – Stefan Rogin Apr 25 '12 at 07:28
  • you just saved me the trouble of typing out the code on my own. No words for now; feeling lazy :D heeee Thanks a toN :) :) I wanted to mark this and David's as the answer. Can I mark both of them? :) An example is what I needed for a start. – gunther Apr 25 '12 at 07:35
  • Don't know, but in order to vote I think you need some reputation, see http://stackoverflow.com/faq#reputation – Stefan Rogin Apr 25 '12 at 07:50