5

In C#, what is the most efficient method to split a text file into multiple text files (the splitting delimiter being a blank line), while preserving the character encoding?

GPX
  • 3,506
  • 10
  • 52
  • 69
  • 1
    Your title and your actual question are different. Do you want to know how to split a text file (title), or how to do it more efficiently (question)? – Arseni Mourzenko Nov 30 '10 at 05:40
  • I am looking for both. Splitting the text file in the most efficient way! – GPX Nov 30 '10 at 05:45

3 Answers3

9

I would use the StreamReader and StreamWriter classes:

 public void Split(string inputfile, string outputfilesformat) {
     int i = 0;
     System.IO.StreamWriter outfile = null;
     string line; 

     try {
          using(var infile = new System.IO.StreamReader(inputfile)) {
               while(!infile.EndOfStream){
                   line = infile.ReadLine();
                   if(string.IsNullOrEmpty(line)) {
                       if(outfile != null) {
                           outfile.Dispose();
                           outfile = null;
                       }
                       continue;
                   }
                   if(outfile == null) {
                       outfile = new System.IO.StreamWriter(
                           string.Format(outputfilesformat, i++),
                           false,
                           infile.CurrentEncoding);
                   }
                   outfile.WriteLine(line);
               }

          }
     } finally {
          if(outfile != null)
               outfile.Dispose();
     }
 }

You would then call this method like this:

 Split("C:\\somefile.txt", "C:\\output-files-{0}.txt");
Gabe
  • 84,912
  • 12
  • 139
  • 238
Andy Edinborough
  • 4,367
  • 1
  • 29
  • 28
  • +1, but I wonder whether a blank line could have the value `System.Environment.NewLine` rather than a null or empty string. – Jeff Ogata Nov 30 '10 at 05:54
  • @adrift: Wouldn't `System.Environment.NewLine` be appended to the end (or beginning) of every line? – GPX Nov 30 '10 at 05:59
  • The "blank line" in text files is always just \r\n (or variants based on OS), how else would you detect it? Text file is just stream of chars. – Pavel Urbančík Nov 30 '10 at 06:10
  • The `ReadLine()` method reads until the new line character(s), so it would never contain them. Also, I've never had a problem with non-normalized line-endings (i.e. a mixture of \r\n, \n, \r, and \n\r), but you could test that fairly easily. – Andy Edinborough Nov 30 '10 at 06:13
  • Andy's method does seem to work for me though. Will be running some tests on large files and post the results soon! – GPX Nov 30 '10 at 06:14
0

Purely for those who want to avoid thinking:

If you have a CSV (comma separated values) file and want to split the file when a field changes, identify/name the file by the change (without unnecessary quote marks), and strip out comments/certain lines (here identified by starting with "#)

Modified method:

public void Split(string inputfile, string outputfilesformat)
{

    System.IO.StreamWriter outfile = null;
    string line;
    string[] splitArray;
    string nameFromFile = "";
    try
    {
        using (var infile = new System.IO.StreamReader(inputfile))
        {
            while (!infile.EndOfStream)
            {
                line = infile.ReadLine();
                splitArray = line.Split(new char[] { ',' });
                if (!splitArray[0].StartsWith("\"#"))
                {
                    if (splitArray[4].Replace("\"", "") != nameFromFile.Replace("\"", ""))
                    {
                        if (outfile != null)
                        {
                            outfile.Dispose();
                            outfile = null;
                        }
                        nameFromFile = splitArray[4].Replace("\"", "");
                        continue;
                    }
                    if (outfile == null)
                    {
                        outfile = new System.IO.StreamWriter(
                            string.Format(outputfilesformat, nameFromFile),
                            false,
                            infile.CurrentEncoding);
                    }
                    outfile.WriteLine(line);
                }
            }
        }
    }
    finally
    {
        if (outfile != null)
            outfile.Dispose();
    }
}

Local path call:

    string strpath = Server.MapPath("~/Data/SPLIT/DATA.TXT");
    string newFile = Server.MapPath("~/Data/SPLIT");
    if (System.IO.File.Exists(@strpath))
    {
        Split(strpath, newFile+"\\{0}.CSV");
    }
user1314350
  • 65
  • 2
  • 10
0

In the case anyone needs to split a text file into multiple files using a string:

public static void Main(string[] args)
    {
         void Split(string inputfile, string outputfilesformat)
        {
            int i = 0;
            System.IO.StreamWriter outfile = null;
            string line;

            try
            {
                using (var infile = new System.IO.StreamReader(inputfile))
                {

                    while (!infile.EndOfStream)
                    {
                        line = infile.ReadLine();
                        if (line.Trim().Contains("String You Want File To Split From"))
                        {
                            if (outfile != null)
                            {
                                outfile.Dispose();
                                outfile = null;
                            }
                            continue;
                        }
                        if (outfile == null)
                        {
                            outfile = new System.IO.StreamWriter(
                                string.Format(outputfilesformat, i++),
                                false,
                                infile.CurrentEncoding);
                        }
                        outfile.WriteLine(line);
                    }

                }
            }
            finally
            {
                if (outfile != null)
                    outfile.Dispose();
            }
        }
        Split("C:test.txt", "C:\\output-files-{0}.txt");

    }
Trevor
  • 11
  • 3