1

Is It possible to create a Linq for retrieving longest string values of DataReader columns ? Column data should be converted to string and then return longest string.

What I know so far is how to get same thing with DataTable which is Enumerable (as I asked here), example:

string maxString = dt
  .AsEnumerable()
  .Select(row => row[mycolumn].ToString())
  .OrderByDescending(st => st.Length)
  .FirstOrDefault();

I've tried combining upper solution for DataReader like this:

var enumerable = reader
  .Cast<IDataRecord>();

string max_string = enumerable
  .Select(record => record.GetName(col).ToString())
  .Aggregate((s, a) => a.Length > s.Length ? a : s);

Unfortunally this doesn't work, I get

Sequence contains no elements in at System.Linq.Enumerable.Aggregate

error. Thanks for help in advance :)

EDIT: I'm looking for a solution without loading data into Datatable etc., just directly from DataReader object. I'm trying to avoid "Out of memory exception", because data is large.

Latest attempt, suggested by Power Mouse (It returns correct value, but from column 1 only):

     for (int col = 0; col < reader.FieldCount; col++)
     {
         string col_name = reader.GetName(col).ToString();
         var enumerable = reader.Cast<IDataRecord>();

         string max_string = enumerable.Where(x => enumerable.Max(y => y[col_name].ToString()
                             .Length) == x[col_name].ToString().Length)
                             .FirstOrDefault()?[col_name].ToString();

         Console.WriteLine("max string of column is : " + max_string);
      }
Lucy82
  • 654
  • 2
  • 12
  • 32
  • `Sequence contains no elements` What might that mean? See also https://stackoverflow.com/questions/8867867/sequence-contains-no-elements-exception-in-linq-without-even-using-single . – mjwills Jul 18 '19 at 12:45
  • 5
    this seems like something that should be done at source, i.e. `select top 1 [SomeColumn] from [SomeTable]order by len([SomeColumn]) desc` ? – Marc Gravell Jul 18 '19 at 12:48
  • @Marc Gravell, I'm leaving this option for scenario If I don't find solution with Linq, because query that DataReader reads is not so small, so calculation with "order by" would take a lot of time, imho. – Lucy82 Jul 18 '19 at 12:54
  • 4
    @Lucy82 if the query is not so small, that's even more reason to do it at the source and only bring back one row over the network, rather than bringing back *tons* of data over the network just to discard all but one row – Marc Gravell Jul 18 '19 at 12:56
  • I think if you get the results you want by using sql query it would be better – Shehab Jul 18 '19 at 13:36
  • @MarcGravell, you are probably right all the way. Though I'm not using DataReader just for this, actually I wrote an OpenXML method (SAX aproach) for writing Excel files directly from It, and answer is needed for autosizing columns. Regarding this I would like to avoid another call to DB, since Datareader allready reads that data. I will try with a loop over DataReader's .FieldCount for column indexes & string lenghts and add that in a List, that might help me. I was hoping for a Linq solution though, but guess Datareader can't work this way without getting data in memory, that makes sense:( – Lucy82 Jul 18 '19 at 14:16
  • DataReader reads row by row, so you can store length and row data to object, and if length is greater - overwrite values. in your approach: ds.Tables[0].AsEnumerable().OrderByDescending(o=> o.Field("Name").Length ).FirstOrDefault() BTW, if datatable is empty = result would be null – Power Mouse Jul 18 '19 at 14:59
  • @PowerMouse, yes I will try to make something like this. – Lucy82 Jul 18 '19 at 15:46
  • that would be another approach, if you need a fill row (you can do .FirstOrDefault()["Name"] at the end if you like ds.Tables[0].AsEnumerable().Where(x => ds.Tables[0].AsEnumerable().Max(y => y["Name"].ToString().Length) == x["Name"].ToString().Length).Dump(); P.S. i would suggest make a list (since sometimes you get more than 1 row to return) – Power Mouse Jul 18 '19 at 16:07
  • @Power Mouse, I don't need that, what I need is to get longest string from only DataReader without any datatable, I don't want store data in memory, query is huge. – Lucy82 Jul 18 '19 at 16:12
  • and why do you go over each column? are you changing requirements? as original request was to go over each ROW. – Power Mouse Jul 19 '19 at 12:18
  • @Power Mouse, I need to calculate longest string in each column, that was original question, my first sentence. Example from DataTable does exactly that. – Lucy82 Jul 19 '19 at 12:22

2 Answers2

1

so according to your original request: you need to find the longest text in specific column when you utilizing a DataReader on the fly. please review example

string storedString = String.Empty;
    SqlConnection connection = new SqlConnection(this.Connection.ConnectionString);
    using (connection)
    {
        string SQLcommand = "select * FROM (VALUES(1, 'xxx' ), (2, 'really long string xxxxxx'), (3, 'short string'), (4, 'another string')) t (id, fName)";
        SqlCommand command = new SqlCommand(SQLcommand, connection);
        connection.Open();
        SqlDataReader reader = command.ExecuteReader();
        while (reader.Read())
        {
            storedString = reader.Cast<IDataRecord>()
                            .Where(w=> w.GetOrdinal("fName").ToString().Length == reader.Cast<IDataRecord>()
                                                                                        .Max(m => m.GetOrdinal("fName")
                                                                                        .ToString().Length))
                            .Select(s=> s.GetString(1))
                            .FirstOrDefault();
        }


    }
    Console.WriteLine($"The longest string: {storedString}. charcount= {storedString.Length}");

the result would be : The longest string: really long string xxxxxx. charcount= 25

as you explained that you need to check multiple columns:

string storedNameString = String.Empty;
    string storedName2String = String.Empty;
    SqlConnection connection = new SqlConnection(this.Connection.ConnectionString);
    using (connection)
    {
        string SQLcommand = "select * FROM (VALUES(1, 'xxx', 'dddddd' ), (2, 'really long string xxxxxx','dfghdt'), (3, 'short string', 'anothercolumn long string'), (4, 'another string', 'test')) t (id, fName, fName2)";
        SqlCommand command = new SqlCommand(SQLcommand, connection);
        connection.Open();
        SqlDataReader reader = command.ExecuteReader();
        while (reader.Read())
        {
            string fName = reader.GetString(reader.GetOrdinal("fName")).ToString();
            if(fName.Length >= storedNameString.Length)
                storedNameString = fName;

            string fName2 = reader.GetString(reader.GetOrdinal("fName2")).ToString();
            if (fName2.Length >= storedName2String.Length)
                storedName2String = fName2;
        }
    }
    Console.WriteLine($"The longest string: {storedNameString}. charcount= {storedNameString.Length}");
    Console.WriteLine($"The longest string: {storedName2String}. charcount= {storedName2String.Length}");
Power Mouse
  • 727
  • 6
  • 16
  • I've configured code to work with multiple datareader's, but your suggestion is not working,It throws exception "**Unable to cast object of type 'System.DateTime' to type 'System.String**'". Other than that, It doesn't handle null values. So I'm back in where I started :( – Lucy82 Jul 22 '19 at 07:48
  • second option works, but I'm wondering if LINQ works faster, because It's quite slow. – Lucy82 Jul 22 '19 at 09:29
0

I solved my problem, unfortunally without LINQ. Problem is that with DataReader you cannot just simply loop through rows & columns as you can with DataTable once stored in memory, but you must perfom somekind of same logic while reader.Read() method is running.

So, best thing I could came up with is to store column indexes and their string values into Dictionary while .Read() method is running.

Doing that, you must be careful about string blank spaces & null values. Here is my solution, which runs good for me:

 Dictionary<int, string> col_values = new Dictionary<int, string>();

 using (OracleDataReader reader = cmd.ExecuteReader())
 {
      for (int i = 0; i < reader.FieldCount; i++)
      {
          //First add indexes to Dictionary 
          // I add column names here - didn't test for string.Empty !!
          col_values.Add(i, string.Empty);
      }

      //Then read row by row and modify Dictionary - If text is larger than string.Empty
      //Dictionary must be .ToArray(), or else you'll have an error for modifying collection
      while (reader.Read())
      {
          foreach (var item in col_values.ToArray())
          {
             string rdr_text = reader[item.Key].ToString().Trim()??string.Empty;

             if (item.Value.Length<rdr_text.Length)
             {
                 col_values[item.Key] = rdr_text;
             }
          }
      }

      foreach (var item in col_values)
      {
         //...And here we have all longest strings stored, for each column...Job done
      }
 }

For my purpuses this Iteration reads around 2.3 mio rows with 12 columns in 4 minutes. It's not fast, but at least It works. If anyone has better/faster idea please provide answer.

Lucy82
  • 654
  • 2
  • 12
  • 32