3

I have a large text file that looks like this where there is one row with an A at the beginning and one C at the end and an x number of B's in between:

A

B

B

B

C

What's the best way to get a number count for the number of times A, B, or C appear? All of these rows have more data but this is what I'm trying to achieve.

Do I have to read in the whole file or is reading it one line at a time the best?

user3266638
  • 429
  • 8
  • 25
  • Either will work, it depends on the file size, and how you want to accomplish the task. Reading by lines is easiest in either case. I would suggest considering LINQ and `File.ReadAllLines()`. – NetMage Jun 27 '19 at 20:38
  • 2
    If you always have 1 A, 1 C, and the rest B, you already know how many A's and C's there are, to count number of B's, do `File.ReadLines(...).Count() - 2`. – Lasse V. Karlsen Jun 27 '19 at 20:44
  • How big is the file? The suggestions so far assume you can read the entire file into memory. Another way is to iterate over each line in the file. See this post https://stackoverflow.com/questions/8037070/whats-the-fastest-way-to-read-a-text-file-line-by-line – Ralph Willgoss Jun 27 '19 at 21:01
  • @RalphWillgoss Actually, the comment to use `File.ReadLines` does not. – NetMage Jun 27 '19 at 21:05

3 Answers3

2

I think something like that would work

foreach (var grouping in File.ReadAllLines("<file-path-here>").GroupBy(x => x[0]))
{
    Console.WriteLine($"char: {grouping.Key}, count: {grouping.Count()}");
}
slig_3
  • 125
  • 6
  • This will throw an exception if the file contained an empty line. I suggest doing `GroupBy(x => x)`, and then inside the `foreach` check if `grouping.Key` is an empty line (you could use `string.IsNullOrEmpty()`), and print the result, or use it, only if not. – Sach Jun 27 '19 at 21:24
  • Or just add a `.Where(x => !String.IsNullOrEmpty(x))` before the .GroupBy – Grim Jun 27 '19 at 22:42
1

The below snippet of code is a easy implementation:

 int iBCount = File.ReadAllLines(filePath).Count -2;
 int iACount = 1; // We already knew this
 int iCCount = 1; // We already knew this

Also, if you know size in bytes of each line (they must be the same for each line) and you are concerned with the performance then you can simply calculate the number of "B" lines as follows

 // There will be no problem with typecast if each lines is the same length in bytes
 int iBLines = (int)(new System.IO.FileInfo(pathToFile).Length / FIXED_LINE_SIZE_IN_BYTES); 
Cabbage Champion
  • 1,193
  • 1
  • 6
  • 21
0
string [] lines = File.ReadAllLines(filePath)

int A_count = 0, B_count = 0, C_count = 0;
foreach (string line in lines)
{
    switch(line[0])
    {
        case 'A':
            A_count++;
            break;
        case 'B':
            B_count++;
            break;
        case 'C':
            C_count++;
    }
}
trotunno
  • 1
  • 1