1

I have a txt file that the format is:

0.32423 1.3453 3.23423
0.12332 3.1231 9.23432432
9.234324234 -1.23432 12.23432
...

Each line has three double value. There are more than 10000 lines in this file. I can use the ReadStream.ReadLine and use the String.Split, then convert it. I want to know is there any faster method to do it.

Best Regards,

ET 0.618
  • 3,433
  • 4
  • 17
  • 5
  • 1
    C is fine, but I need the C#. – ET 0.618 Dec 24 '09 at 12:43
  • Mark, why would C be faster for an I/O bound problem? – H H Dec 24 '09 at 12:45
  • If you are parsing the file multiple times, you would see an improvement in C. But my question to the poster is are you parsing the file multiple times? And if so, why? And if not, why is this even a problem? It seems to me that if you even have to ask this question, there is something wrong with your program design. – Mark Byers Dec 24 '09 at 13:01

5 Answers5

5

StreamReader.ReadLine, String.Split and Double.TryParse sounds like a good solution here.
No need for improvement.

dtb
  • 213,145
  • 36
  • 401
  • 431
2

There may be some little micro-optimisations you can perform, but the way you've suggested sounds about as simple as you'll get.

10000 lines shouldn't take very long - have you tried it and found you've actually got a performance problem? For example, here are two short programs - one creates a 10,000 line file and the other reads it:

CreateFile.cs:

using System;
using System.IO;

public class Test
{
    static void Main()
    {
        Random rng = new Random();
        using (TextWriter writer = File.CreateText("test.txt"))
        {
            for (int i = 0; i < 10000; i++)
            {
                writer.WriteLine("{0} {1} {2}", rng.NextDouble(),
                                 rng.NextDouble(), rng.NextDouble());
            }
        }
    }
}

ReadFile.cs:

using System;
using System.Diagnostics;
using System.IO;
using System.Linq;

public class Test
{
    static void Main()
    {   
        Stopwatch sw = Stopwatch.StartNew();
        using (TextReader reader = File.OpenText("test.txt"))
        {
            string line;
            while ((line = reader.ReadLine()) != null)
            {
                string[] bits = line.Split(' ');
                foreach (string bit in bits)
                {
                    double value;
                    if (!double.TryParse(bit, out value))
                    {
                        Console.WriteLine("Bad value");
                    }
                }
            }
        }
        sw.Stop();
        Console.WriteLine("Total time: {0}ms",
                          sw.ElapsedMilliseconds);
    }
}

On my netbook (which admittedly has an SSD in) it only takes 82ms to read the file. I would suggest that's probably not a problem :)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
2

I would suggest reading all your lines at once with

string[] lines = System.IO.File.ReadAllLines(fileName);

This wold ensure that the I/O is done with the maximum efficiency. You woul have to measure (profile) but I would expect the conversions to take far less time.

H H
  • 263,252
  • 30
  • 330
  • 514
  • Potentially a dangerous thing to do as you've no idea how many lines there are in the file. :-) – Kevin Shea Dec 24 '09 at 13:21
  • 2
    Rev, you are right but in the question: "more than 10,000 lines", I take that to be "less than 20,000" and I would use `ReadAllLines()` up to 100,000 without thinking. – H H Dec 24 '09 at 13:44
0

This solution is a little bit slower (see benchmarks at the end), but its nicer to read. It should also be more memory efficient because only the current character is buffered at the time (instead of the whole file or line).

Reading arrays is an additional feature in this reader which assumes that the size of the array always comes first as an int-value.

IParsable is another feature, that makes it easy to implement Parse methods for various types.

class StringSteamReader {
    private StreamReader sr;

    public StringSteamReader(StreamReader sr) {
        this.sr = sr;
        this.Separator = ' ';
    }

    private StringBuilder sb = new StringBuilder();
    public string ReadWord() {
        eol = false;
        sb.Clear();
        char c;
        while (!sr.EndOfStream) {
            c = (char)sr.Read();
            if (c == Separator) break;
            if (IsNewLine(c)) {
                eol = true;
                char nextch = (char)sr.Peek();
                while (IsNewLine(nextch)) {
                    sr.Read(); // consume all newlines
                    nextch = (char)sr.Peek();
                }
                break;
            }
            sb.Append(c);
        }
        return sb.ToString();
    }

    private bool IsNewLine(char c) {
        return c == '\r' || c == '\n';
    }

    public int ReadInt() {
        return int.Parse(ReadWord());
    }

    public double ReadDouble() {
        return double.Parse(ReadWord());
    }

    public bool EOF {
        get { return sr.EndOfStream; }
    }

    public char Separator { get; set; }

    bool eol;
    public bool EOL {
        get { return eol || sr.EndOfStream; }
    }

    public T ReadObject<T>() where T : IParsable, new() {
        var obj = new T();
        obj.Parse(this);
        return obj;
    }

    public int[] ReadIntArray() {
        int size = ReadInt();
        var a = new int[size];
        for (int i = 0; i < size; i++) {
            a[i] = ReadInt();
        }
        return a;
    }

    public double[] ReadDoubleArray() {
        int size = ReadInt();
        var a = new double[size];
        for (int i = 0; i < size; i++) {
            a[i] = ReadDouble();
        }
        return a;
    }

    public T[] ReadObjectArray<T>() where T : IParsable, new() {
        int size = ReadInt();
        var a = new T[size];
        for (int i = 0; i < size; i++) {
            a[i] = ReadObject<T>();
        }
        return a;
    }

    internal void NextLine() {
        eol = false;
    }
}

interface IParsable {
    void Parse(StringSteamReader r);
}

It can be used like this:

public void Parse(StringSteamReader r) {
    double x = r.ReadDouble();
    int y = r.ReadInt();
    string z = r.ReadWord();
    double[] arr = r.ReadDoubleArray();
    MyParsableObject o = r.ReadObject<MyParsableObject>();
    MyParsableObject [] oarr = r.ReadObjectArray<MyParsableObject>();
}

I did some benchmarking, comparing StringStreamReader with some other approaches, already proposed (StreamReader.ReadLine and File.ReadAllLines). Here are the methods I used for benchmarking:

private static void Test_StringStreamReader(string filename) {
    var sw = new Stopwatch();
    sw.Start();
    using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
        var r = new StringSteamReader(sr);
        r.Separator = ' ';
        while (!r.EOF) {
            var dbls = new List<double>();
            while (!r.EOF) {
                dbls.Add(r.ReadDouble());
            }
        }
    }
    sw.Stop();
    Console.WriteLine("elapsed: {0}", sw.Elapsed);
}

private static void Test_ReadLine(string filename) {
    var sw = new Stopwatch();
    sw.Start();
    using (var sr = new StreamReader(new FileStream(filename, FileMode.Open, FileAccess.Read))) {
        var dbls = new List<double>();

        while (!sr.EndOfStream) {
            string line = sr.ReadLine();
            string[] bits = line.Split(' ');
            foreach(string bit in bits) {
                dbls.Add(double.Parse(bit));
            }
        }
    }
    sw.Stop();
    Console.WriteLine("elapsed: {0}", sw.Elapsed);
}

private static void Test_ReadAllLines(string filename) {
    var sw = new Stopwatch();
    sw.Start();
    string[] lines = System.IO.File.ReadAllLines(filename);
    var dbls = new List<double>();
    foreach(var line in lines) {
        string[] bits = line.Split(' ');
        foreach (string bit in bits) {
            dbls.Add(double.Parse(bit));
        }
    }        
    sw.Stop();
    Console.WriteLine("Test_ReadAllLines: {0}", sw.Elapsed);
}

I used a file with 1.000.000 lines of double values (3 values each line). File is located on a SSD disk and each test was repeated multiple times in release-mode. These are the results (on average):

Test_StringStreamReader: 00:00:01.1980975
Test_ReadLine:           00:00:00.9117553
Test_ReadAllLines:       00:00:01.1362452

So, as mentioned StringStreamReader is a bit slower than the other approaches. For 10.000 lines, the performance is around (120ms / 95ms / 100ms).

chrisn
  • 748
  • 6
  • 17
  • if you want to know how fast you can possibly go, check out that solution: http://stackoverflow.com/questions/7153315/how-to-parse-a-text-file-in-c-sharp-and-be-io-bound/7154560#7154560 – chrisn Oct 08 '13 at 20:47
0

your method is already good!

you can improve it by writing a readline function that returns an array of double and you reuse this function in other programs.

Yin Zhu
  • 16,980
  • 13
  • 75
  • 117