-2

I am processing significantly large amounts of data (millions of 300 variable plus objects). For an object to be added to the database, it must possess at least one of the 100 double? specified variables.

class RowObject {
    double? var1 {get; set;}
    double? var2 {get; set;}
    //Another 98 double? variables declared
    double? var100 {get; set;}
}

I have come up with two ways to check, adding all the variables together and seeing whether the result is greater than 0 or not null.

RowObject rO = new RowObject();
rO.var1 = 7250.345;
rO.var2 = null;
rO.var3 = 64.742l
//etc...

var sum = rO.var1 + rO.var2 + rO.var3 + ... rO.var100;
if (sum != null) {
   //do something;
}

Or not surprisingly using an if statement

if (rO.var1 != null || rO.var2 != null|| ... rO.var100 != null) {
    //do something;
}

Besides speed, 100 variables will reduce readability quite a bit, so if there is a better way that is negligibly slower but far easier on the eyes/understandable I would see that as a valid answer.

vvp1
  • 21
  • 5
  • 3
    You need to use a collection type, like Enumerable or List, not separate variables. Where are you getting the variable values from? – Dour High Arch Oct 07 '19 at 22:01
  • 1
    Did you mean `if (var1 != null **&&** (...)`? – tymtam Oct 07 '19 at 22:08
  • @tymtam, no it is correct as is, if any of the 100 variables are not null then the code should proceed. – vvp1 Oct 07 '19 at 22:10
  • Im paraphraseing the names of these variables, in reality they are Revenue, Total_Net_Assets, etc. A list or enumerable will not work as they are properties that each tie to a SQL field. – vvp1 Oct 07 '19 at 22:12
  • Use reflection: here's a simple example: https://stackoverflow.com/questions/7649324/c-sharp-reflection-get-field-values-from-a-simple-class – RandomUs1r Oct 07 '19 at 22:22
  • 1
    @RandomUs1r reflection would be considerably slower – TheGeneral Oct 07 '19 at 22:38
  • @TheGeneral there's always a trade off, maintaining a sum of 100 fields give or take a few doesn't seem too desirable either from a code smell perspective. – RandomUs1r Oct 07 '19 at 22:52
  • 1
    If you were super serious about speed here, you wouldn't store 300 items as properties, or even fields, you put them in an array that could be split up in parallel – TheGeneral Oct 07 '19 at 22:53
  • Move the logic inside the `RowObject` class. Sure, the `if` check is ugly - but if it is hidden behind a `AtLeastOneValueIsSet` readonly property then you get the speed benefit, and the method has a nice clear name. – mjwills Oct 07 '19 at 23:02
  • 2
    I asked where you are getting these values from; if from a database you should be doing this logic in the database, not in your business layer. Also, if this is a relational database, tables with dozens of columns is a symptom of denormalized data; fixing that will probably improve performance more than anything you do in your business layer. – Dour High Arch Oct 07 '19 at 23:13
  • 1
    In setter you can update count of properties that are not null to speedup the check. But it rather seems your design is flawed and you are fixing a wrong place. – Antonín Lejsek Oct 08 '19 at 00:31
  • @Dour High Arch the variables are being populated from an Excel sheet which loads financial data points. There are specific issues with the sheet that prevent a direct Excel to SQL dump so there is a C# bridge. The databases are normalized, however we may split the larger table in the future, but currently any operation hitting this table requires all of the fields anyways, and there are no writes to the table beyond the original dump which is a nightly batch process. – vvp1 Oct 09 '19 at 14:31

3 Answers3

1

Well, writing out each one in a row in an if statement probably the most efficient because it reads 1 item in the best case (due to short-circuiting the first true you get), and n items in the worst case.

Adding them all up is always reading every item, so no more efficient.

But as you say, these solutions are not terribly readable. A solution to readability is to write a function that puts each element into an IEnumerable, then uses Linq's Any to test the list:

using System;
using System.Collections.Generic;
using System.Linq;

public class SampleProgram
{

    public class RowObject {
        public double? var1 {get; set;}
        public double? var2 {get; set;}
        //Another 98 double? variables declared
        public double? var100 {get; set;}
    }

    private static void GetRowList(RowObject obj, List<Nullable<double>> rowList)
    {
        rowList.Clear();
        rowList.Add(obj.var1);
        rowList.Add(obj.var2);
        //Another 98 double? variables declared
        rowList.Add(obj.var100);
    }

    private static bool TestRow(List<Nullable<double>> rowList)
    {
        return rowList.Any( n => !n.HasValue );
    }

    public static void Main(string[] args)
    {
        RowObject o1 = new RowObject();
        o1.var1 = null;
        o1.var2 = 2;
        o1.var100 = 100;

        List<Nullable<double>> rowList = new List<Nullable<double>>();

        GetRowList(o1, rowList);
        Console.WriteLine(TestRow(rowList));

        RowObject o2 = new RowObject();
        o2.var1 = 1;
        o2.var2 = 2;
        o2.var100 = 100;

        GetRowList(o2, rowList);
        Console.WriteLine(TestRow(rowList));
    }
}

This requires reading and writing each item once to put it in a list, then reading 1-n items to make the test. But it is more readable.

If you don't want to hardcode the properties in GetRowList and are willing to sacrifice some more speed, you could use reflection to add all of the properties to a list that way.

Ruzihm
  • 19,749
  • 5
  • 36
  • 48
  • Would it be possible to use `IEnumerable` to avoid the overhead of populating the `List`? – mjwills Oct 07 '19 at 23:04
  • In real code you'd probably add lambdas to get value to list instead of values like `rowList.Add( (Func) r => r.var2);` and change test to take a row `rowList.Any(n => !n(row).HasValue)`... – Alexei Levenkov Oct 08 '19 at 00:00
  • Yes I considered reflection, but was worried about the performance hit. Thanks for the Enumerable suggestion and explaining the read time difference between sum and logic operations – vvp1 Oct 08 '19 at 14:19
0

(millions of 300 variable plus objects)

Do you mean millions of objects with 300+ properties each?

class A {
  public int? a1 {get;}
  public int? a2 {get;}
  (...)
  public int? a301 {get;}
}

I've run a quick test.

I run both options 4 times each.

Results

if (a.A1 == null || a.A2 == null || a.A2 == null || a.A3 == null || a.A4 == null || a.A5 == null || a.A6 == null || a.A7 == null || a.A8 == null || a.A9 == null || a.A10 == null)
00:00:00.4341559, 00:00:00.4751146, 00:00:00.4799181, 00:00:00.4522816
var sum = a.A1 + a.A2 +a.A3 +a.A4 +a.A5 +a.A6 +a.A7 +a.A8 +a.A9 +a.A10;
if( sum == null )
00:00:00.6336356, 00:00:00.5714210, 00:00:00.6071693, 00:00:00.6795270

Code

class A
{
    public A(double? a1, double? a2, double? a3, double? a4, double? a5, double? a6, double? a7, double? a8, double? a9, double? a10)
    {
        this.A1 = a1;
        this.A2 = a2;
        this.A3 = a3;
        this.A4 = a4;
        this.A5 = a5;
        this.A6 = a6;
        this.A7 = a7;
        this.A8 = a8;
        this.A9 = a9;
        this.A10 = a10;
    }

    public double? A1 { get; }
    public double? A2 { get; }
    public double? A3 { get; }
    public double? A4 { get; }
    public double? A5 { get; }
    public double? A6 { get; }
    public double? A7 { get; }
    public double? A8 { get; }
    public double? A9 { get; }
    public double? A10 { get; }

}

static void Main(string[] args)
{
    var r = new Random(1);

    var As = Enumerable.Range(0, 1000000)
        .Select(i => new A(
             r.NextDouble(),
             r.NextDouble(),
            r.NextDouble(),
            r.NextDouble(),
            r.NextDouble(),
            r.NextDouble(),
            r.NextDouble(),
            r.NextDouble(),
            r.NextDouble(),
            r.NextDouble()
        ));

    var index = 0;
    var sw = Stopwatch.StartNew();
    foreach (var a in As)
    {
        if (a.A1 == null || a.A2 == null || a.A2 == null || a.A3 == null || a.A4 == null || a.A5 == null || a.A6 == null || a.A7 == null || a.A8 == null || a.A9 == null || a.A10 == null)
        {
            index++;
        }

        //var sum = a.A1 + a.A2 +a.A3 +a.A4 +a.A5 +a.A6 +a.A7 +a.A8 +a.A9 +a.A10;
        //if( sum == null )
        //{
        //    index++;
        //}

    }
    Console.WriteLine(sw.Elapsed);
}
tymtam
  • 31,798
  • 8
  • 86
  • 126
0

Why not use an embedded array like that?

public class RowObject : IEnumerable<double?>
{
  private double?[] vars { get; set; }

  IEnumerator<double?> IEnumerable<double?>.GetEnumerator()
  {
    foreach ( var value in vars )
      yield return value;
  }

  IEnumerator IEnumerable.GetEnumerator()
  {
    foreach ( var value in vars )
      yield return value;
  }

  private void CheckIndex(int index, int min, int max)
  {
    if ( index < min || index > max )
      throw new ArgumentOutOfRangeException("Index", $"Must be between {min} and {max}");
  }

  public double? this[int index]
  {
    get
    {
      CheckIndex(index, 0, vars.Length);
      return vars[index];
    }
    set
    {
      CheckIndex(index, 0, vars.Length);
      vars[index] = value;
    }
  }

  public RowObject(int capacity)
  {
    vars = new double?[capacity];
  }
}

If you want an indexer starting from 1

  public double? this[int index]
  {
    get
    {
      CheckIndex(index, 1, vars.Length + 1);
      return vars[index + 1];
    }
    set
    {
      CheckIndex(index, 1, vars.Length + 1);
      vars[index + 1] = value;
    }
  }
}

Test:

static void Test()
{
  RowObject r0 = new RowObject(3);
  r0[0] = 7250.345;
  r0[1] = null;
  r0[2] = 64.742;
  RowObject r1 = new RowObject(3);
  r1[0] = null;
  r1[1] = null;
  r1[2] = null;
  RowObject r2 = new RowObject(3);
  r2[0] = 7250.345;
  r2[1] = 1000.0;
  r2[2] = 64.742;

  Action<RowObject, string> test = (rowobject, name) =>
  {
    var sum = rowobject.Sum(); // any null value is evaluated as 0
    Console.WriteLine(name + ".Sum() = " + sum);
    if ( rowobject.Any(v => v != null) )
      Console.WriteLine(name + " contains at least a not null value");
    if ( rowobject.Any(v => v == null) )
      Console.WriteLine(name + " contains at least one null value");
    if ( rowobject.All(v => v != null) )
      Console.WriteLine(name + " contains no null value");
    if ( rowobject.All(v => v == null) )
      Console.WriteLine(name + " contains only null values");
  };

  test(r0, "r0");
  Console.WriteLine();
  test(r1, "r1");
  Console.WriteLine();
  test(r2, "r2");
}

Output:

r0.Sum() = 7315,087
r0 contains at least a not null value
r0 contains at least one null value

r1.Sum() = 0
r1 contains at least one null value
r1 contains only null values

r2.Sum() = 8315,087
r2 contains at least a not null value
r2 contains no null value