10

I need to delete a certain line from a text file. What is the most efficient way of doing this? File can be potentially large(over million records).

UPDATE: below is the code I'm currently using, but I'm not sure if it is good.

internal void DeleteMarkedEntries() {
    string tempPath=Path.GetTempFileName();
    using (var reader = new StreamReader(logPath)) {
        using (var writer = new StreamWriter(File.OpenWrite(tempPath))) {
            int counter = 0;
            while (!reader.EndOfStream) {
                if (!_deletedLines.Contains(counter)) {
                    writer.WriteLine(reader.ReadLine());
                }
                ++counter;
            }
        }
    }
    if (File.Exists(tempPath)) {
        File.Delete(logPath);
        File.Move(tempPath, logPath);
    }
}
John Saunders
  • 160,644
  • 26
  • 247
  • 397
Valentin V
  • 24,971
  • 33
  • 103
  • 152
  • If you have such a large data store, why are you not using a "real" database? Is it a limitation in what tools you have available, your current skills or the specifications of your project? – Tomas Aschan Feb 10 '09 at 13:17
  • It is a requirement from 'above'. Using real database would be easier for me, but unfortunately, I can't use it. – Valentin V Feb 10 '09 at 13:20
  • It's not good, there's a bug - sorry :( - See my answer below – Binary Worrier Feb 10 '09 at 13:35

8 Answers8

10

The most straight forward way of doing this is probably the best, write the entire file out to a new file, writing all lines except the one(s) you don't want.

Alternatively, open the file for random access.

Read to the point where you want to "delete" the line. Skip past the line to delete, and read that number of bytes (including CR + LF - if necessary), write that number of bytes over the deleted line, advance both locations by that count of bytes and repeat until end of file.

Hope this helps.

EDIT - Now that I can see your code

if (!_deletedLines.Contains(counter)) 
{                            
    writer.WriteLine(reader.ReadLine());                        
}

Will not work, if its the line you don't want, you still want to read it, just not write it. The above code will neither read it or write it. The new file will be exactly the same as the old.

You want something like

string line = reader.ReadLine();
if (!_deletedLines.Contains(counter)) 
{                            
    writer.WriteLine(line);                        
}
Binary Worrier
  • 50,774
  • 20
  • 136
  • 184
3

Text files are sequential, so when deleting a line, you'll have to move all the following lines up. You can use file mapping (a win32 api that you can call through PInvoke) to make this operation a bit less painfull, but you surelly should considere using a non sequential structure for you file so that you can mark a line as deleted without realy removing it from the file... Especially if it should happen frenquently.

If I've remember File Mapping Api should be added to .Net 4.

thinkbeforecoding
  • 6,668
  • 1
  • 29
  • 31
1
     try{
     Scanner reader = new Scanner(new File("D:/seenu.txt")); 
     System.out.println("Enter serial number:");
     String sl1=bufRead.readLine();
     System.out.print("Please Enter The ServerName:");
     String name=bufRead.readLine();
     System.out.println("Please Enter The IPAddress");
     String ipa=bufRead.readLine();

    System.out.println("Line Deleted.");
     PrintWriter writer = new PrintWriter(new FileWriter(new File("D:/user.txt")),true); 
     //for(int w=0; w<n; w++)
       writer.write(reader.nextLine()); 
     reader.nextLine(); 
     while(reader.hasNextLine())
       writer.write(reader.nextLine());
     } catch(Exception e){
       System.err.println("Enjoy the stack trace!");
       e.printStackTrace();
     }
Rajkumar
  • 19
  • 1
0

In my blog, I have benchmarked various I/O methods from C# in order to determine the most efficient way of doing file I/O. In general, you are better off using the Windows ReadFile and WriteFile functions. The next fastest way to read files in is through FileStream. To get good performance, read the files in blocks at a time instead of a line at a time and then do your own parsing. The code that you can download from my blog gives you an example on how to do this. There is also a C# class that encapsulates the Windows ReadFile / WriteFile functionality and is quite easy to use. See my blog for details at:

http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp

Bob Bryan MCSD

Bob Bryan
  • 3,687
  • 1
  • 32
  • 45
0

If you absolutely have to use a text file and cannot switch to a database, maybe you want to designate a weird symbol at the beginning of a line to mean "line deleted". Just have your parser ignore those lines, like comment lines in config files etc.

Then have a periodic "compact" routine like Outlook, and most database systems do, which re-writes the entire file excluding the deleted lines.

I would strongly go with Think Before Coding's answer recommending a database or other structured file.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
Bork Blatt
  • 3,308
  • 2
  • 19
  • 17
  • yes, the requirement is to be able to have a human readable file (but I'm not sure how any human can possible skim through a million lines!). I can't do anything about this requirement. – Valentin V Feb 10 '09 at 13:28
0

Move you file to memory using File Mapping, like Think Before Coding did, and made deletions on memory and after write to disk.
Read this File Read Benchmarks - C#
C# accessing memory map file

Community
  • 1
  • 1
lsalamon
  • 7,998
  • 6
  • 50
  • 63
0

Depending on what exactly counts as "deleting", your best solution may be to overwrite the offending line with spaces. For many purposes (including human consumption), this is equivalent to deleting the line outright. If the resulting blank line is a problem, and you are sure you'll never delete the first line, you can append the spaces to the previous line by also overwriting the CRLF with two spaces.

(Based on the comment to Bork Blatt's answer)

MSalters
  • 173,980
  • 10
  • 155
  • 350
-1

Read your file into a Dictionary on non delete lines set the int to 0 on line you need to mark as deleted set int to 1. Use a KeyValuePair to extract the lines that don't needed to be deleted and write them to a new file.

Dictionary<string, int> output = new Dictionary<string, int>();

// read line from file

...

// if need to delete line then set int value to 1

// otherwise set int value to 0
if (deleteLine)
{
    output[line] = 1;
}
else
{
    output[line] = 0;
}

// define the no delete List
List<string> nonDeleteList = new List<string>();

// use foreach to loop through each item in nonDeleteList and add each key
// who's value is equal to zero (0) to the nonDeleteList.
foreach (KeyValuePair<string, int> kvp in output)
{

    if (kvp.Value == 0)

    {

        nonDeleteList.Add(kvp.Key);

    }
}

// write the nondeletelist to the output file
File.WriteAllLines("OUTPUT_FILE_NAME", nonDeleteList.ToArray());

That's it.