-1

I'm new to C# so I apologize for any trashy code.

I am trying to use StreamWriter.Write/WriteLine to either write a new line or write two consecutive lines as one line to a file.

I have a text file with 12 million rows that has a line break character that occurs in a certain field in hundreds of rows, causing the row to be split into two. Here's a simplified example:

012345, District 1, John Smith, Active\n
987624, District 2\n
, Jane Doe, Inactive\n
583940, District 3, Bobby Roberts, Active\n

I'm using StreamReader and a while loop to read through each line and replace the errant line breaks, then write each line to a new file. I thought I could use Write() to write the offending line ("987624, District 2") without a line break at the end and WriteLine() to add the next line to the offending line.

    static void Main(string[] args)
    {
        string line;

        using (StreamReader sr = new StreamReader("sourcefile.txt"))
        using (StreamWriter swp = new StreamWriter("processedfile.txt", append: true))
        {
            while ((line = sr.ReadLine()) != null)
            {
                if (line.Length < 25 && Char.IsDigit(line, 0))
                {
                    swp.Write(line.Replace(Environment.NewLine, ""));
                }
                else
                {
                    swp.WriteLine(line);
                }
            }
        }

Expected result:

012345, District 1, John Smith, Active
987624, District 2, Jane Doe, Inactive
583940, District 3, Bobby Roberts, Active

Actual result:

012345, District 1, John Smith, Active
987624, District 2
, Jane Doe, Inactive
583940, District 3, Bobby Roberts, Active

I can't do anything about how the file comes, it's just my responsibility to fix it.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
Judson1101
  • 35
  • 4
  • That won't work. `string.Write()` simply writes content in one line, but if the content has a new line, it will still go into a second line. – Sach Aug 07 '19 at 16:59
  • Try replacing "\\n" – misticos Aug 07 '19 at 17:03
  • 3
    A line won't contain a linebreak in the middle, since that's the definition of the end of the line...sounds like you have a malformed document and will have to parse it some other way, like `if (line.StartsWith(',')) // append this line to the previous one` – Rufus L Aug 07 '19 at 17:04
  • 1
    "Fix your dataset," should never be answered with "I can't do anything." Otherwise, you risk additional future issues as the dataset becomes more complex and contains other fields/data that trip up your attempt to clean it. – gravity Aug 07 '19 at 17:08
  • Try `swp.WriteLine(line.Replace("\\n", "");` You still want WriteLine so a line break is added to the end of the whole line. As mentioned above, `Enviornment.NewLine()` is `\r\n` so try just `\n` – dvo Aug 07 '19 at 17:09
  • please try this `string removedBreaks = Line.Replace("\r\n", replaceWith).Replace("\n", replaceWith).Replace("\r", replaceWith);` ---found from here https://stackoverflow.com/a/238016/6923146 – Hardik Masalawala Aug 07 '19 at 17:13
  • Your code has an error : if (line.Length < 25 && Char.IsDigit(line, 0)) Should be : if (line.Length < 25 && Char.IsDigit(line, line.Length - 1)) – jdweng Aug 07 '19 at 17:20
  • 1
    @jdweng No, it's the digit at the beginning of a real line, not a digit-in-text at an incorrect line break, that OP wants to test, to find the initial part. The fact that "District 2" ends in a digit is coincidence. – madreflection Aug 07 '19 at 17:22
  • You need to test for the line not to end in a digit for it to work. – jdweng Aug 07 '19 at 18:05

1 Answers1

1

I would use the same if condition you use (assuming you're certain that there won't be exceptions), but in case it's true, read an additional line and concatenate the two.

var file = @"input.txt";
var output = @"output.txt";
var line = string.Empty;

using (var sr = new StreamReader(file))
{
    using (var sw = new StreamWriter(output))
    {
        while (!sr.EndOfStream)
        {
            line = sr.ReadLine();
            if (line.Length < 25 && Char.IsDigit(line, 0))
            {
                var line2 = sr.ReadLine();
                line += line2;
            }
            sw.WriteLine(line);
        }
    }
}

Input file:

012345, District 1, John Smith, Active

987624, District 2

, Jane Doe, Inactive

583940, District 3, Bobby Roberts, Active

Output file:

012345, District 1, John Smith, Active

987624, District 2, Jane Doe, Inactive

583940, District 3, Bobby Roberts, Active

Community
  • 1
  • 1
Sach
  • 10,091
  • 8
  • 47
  • 84
  • I think you have nailed it, unless there are no more than one straw _\n_ to break the line – Steve Aug 07 '19 at 17:23
  • Agreed. The OP doesn't specify whether it can have different types/number of line breaks, so this comes with the caveat that this will only work for the given example. – Sach Aug 07 '19 at 17:25
  • This works perfectly. Fortunately, there's only one occurrence in each line and it always occurs in the same field. Thanks. – Judson1101 Aug 07 '19 at 18:39