0

I am working an a section of application which needs to Parse CSV Logs generated by PostgreSql server.

  • The Logs are stored C:\Program Files\PostgreSQL\9.0\data\pg_log

  • The Server version in 9.0.4

  • The application is developed in C Sharp

    • The basic utility after Parse the Log is to show contents in a DataGridView.
    • There are other filter options like to view log contents for a particular range of Time for a Day.

However the main problem that is, the Log format is not readable

It was first tested with A Fast CSV Reader

Then we made a custom utility using String.Split method with the usual Foreach loop going through the array

A Sample Log data line

2012-03-21 11:59:20.640 IST,"postgres","stock_apals",3276,"localhost:1639",4f697540.ccc,10,"idle",2012-03-21 11:59:20 IST,2/163,0,LOG,00000,"statement: SELECT id,pdate,itemname,qty from stock_apals order by pdate,id",,,,,,,,"exec_simple_query, .\src\backend\tcop\postgres.c:900",""

As you can see the columns in the Log are comma separated , But however individual values are not Quote Enclosed.

For instance the 1st,4rth,6th .. columns

Is there a utility or a Regex that can find malformed columns and place quotes

This is especially with respect to performace, becuase these Logs are very long and new ones are made almost every hour

I just want to update the columns and use the FastCSVReader to parse it.

Thanks for any advice and help

Community
  • 1
  • 1
arvind
  • 1,385
  • 1
  • 13
  • 21
  • Doesn't FastCSVReader support mixed quoted/unquoted values? – Petr Abdulin Mar 22 '12 at 07:02
  • what happens wrong is when it reaches the column where sql statement is place. it also has commas set for table columns. The log line is a mix bunch of quote-enclosed and non-quote-enclosed column. is there is a regex or utility to convert the non-quoted column to quoted column – arvind Mar 22 '12 at 09:09
  • on CSVReader page it highlights "This reader supports fields spanning multiple lines. The only restriction is that they must be quoted, otherwise it would not be possible to distinguish between malformed data and multi-line values." – arvind Mar 22 '12 at 09:12
  • Could you please provide some reasonable amount of log (including header) on http://pastebin.com/ for example, so I can test it in my own CVS parser. – Petr Abdulin Mar 22 '12 at 10:55
  • here is a sample http://pastebin.com/uwfmRdU7 – arvind Mar 22 '12 at 13:55
  • 1
    Thanks, unfortunatelly while my parser support mixed quoted columns, it does not support multiline values yet. I will work to fix that and will let you know then I will be able to parse your data. – Petr Abdulin Mar 22 '12 at 14:56
  • is your parser available for sharing. maybe i can help update it to parse multiline values – arvind Mar 22 '12 at 16:09

1 Answers1

1

I've updated my csv parser, so it's now able to parse you data (at least provided in example). Below is exampe console app which is parsing your data saved in multiline_quotes.txt file. Project source can be found here (you can download a ZIP). You need either Gorgon.Parsing or Gorgon.Parsing.Net35 (in case you can't use .NET 4.0).

Actually I was able to achive same result using Fast CSV Reader. You just used it some wrong way in the first place.

namespace So9817628
{
    using System.Data;
    using System.Text;
    using Gorgon.Parsing.Csv;

    class Program
    {
        static void Main(string[] args)
        {
            // prepare
            CsvParserSettings s = new CsvParserSettings();
            s.CodePage = Encoding.Default;
            s.ContainsHeader = false;
            s.SplitString = ",";
            s.EscapeString = "\"\"";
            s.ContainsQuotes = true;
            s.ContainsMultilineValues = true;
            // uncomment below if you don't want escape quotes ("") to be replaced with single quote
            //s.ReplaceEscapeString = false;

            CsvParser parser = new CsvParser(s);

            DataTable dt = parser.ParseToDataTableSequential("multiline_quotes.txt");

            dt.WriteXml("parsed.xml");
        }
    }
}
Petr Abdulin
  • 33,883
  • 9
  • 62
  • 96