1

I have need to check for the data returned from a Sybase query to check the first column which corresponds to which spreadsheet a given set of data will be displayed. I can get the data correctly in my dataset but the trouble comes when I try and remove the grouped column.

My sample data is as follows:

public static DataTable JustDataTableWithSheetColumn()
{
    var sample = new DataTable();

    sample.Columns.Add("SomethingSheet", typeof(string));
    sample.Columns.Add("Column1", typeof(string));
    sample.Columns.Add("Column2", typeof(string));

    sample.Rows.Add("1", "Somestuff", "this other stuff");
    sample.Rows.Add("1", "Fluffy", "this stuff");
    sample.Rows.Add("2", "ToolShed", "this other stf");
    sample.Rows.Add("2", "FudgeCycles", "this oer stuff");
    sample.Rows.Add("2", "Crap", "thr stuff");
    sample.Rows.Add("2", "stuff and stuff", "thiuff");
    sample.Rows.Add("2", "test crap", "this  stuff");
    sample.Rows.Add("2", "dungheap", "this othuff");
    sample.Rows.Add("3", "people eater", "this other stf");
    sample.Rows.Add("3", "no purple people", "ths oth stff");

    return sample;
}

And the following properly groups and applies them to datatables:

var thing = DataTableExamples.JustDataTableWithSheetColumn();

if (thing.Columns[0].ColumnName.Contains("Sheet"))
{
    var test = from t in thing.AsEnumerable()
        group t by t.ItemArray[0]
        into g
        select g.ToList().CopyToDataTable();
}

I have tried without the .CopyToDataTable() to just have a collection of DataRows and tried to query those using Skip(1) but I think my order of operations is incorrect or I am within that nebula I sometimes get lost in with IEnumerable vs IQueryable and all those variations.

Like so:

datarow.ItemArray.Skip(1).ToArray();

I have also tried to loop through the datatable collection and use dt.RemoveAt(0) but I am unable to reference the manipulated datatable (or incorrectly doing so).

foreach (var dt in test)
{ 
    dt.Columns.RemoveAt(0);
}

Bottom line is that the first column on the data returned from the database doesn't need to be displayed or used once I translate them into datatables for later addition to the Dataset I return from this API.

So I am looking for an elegant way to get what I require with as little boxing and unboxing as possible. This part has to be done prior to adding it to the end Dataset because each of these weird stored procedures could potentially return a different datatable structure so I can't simply remove all of the first columns from the end dataset.

Thanks for your help.

Slagmoth
  • 173
  • 1
  • 10
  • do you want the same datatable without the first column? – Mohammed Sajid Jun 23 '20 at 21:49
  • @Sajid That would be ideal. It came to me in the shower that I might have to drop the column before I fill the datatable but have not tried that just yet. I need to also keep the same basic structure, i.e. datatypes for each column as well as the names of the columns. – Slagmoth Jun 23 '20 at 23:22
  • @Sajid To be clear, I need the same table in this case 3 times just not with the first column. – Slagmoth Jun 23 '20 at 23:23
  • I thought I understood you, right up until I read the last paragraph. Your question title says *"How to delete first column in all datatables within a collection of datatables C#"* and your ultimate paragraph says *"I can't simply remove all of the first columns from the end dataset"* - is "collection of datatables" a subset of "the end dataset"? Where is this variable that holds the "collection of datatables" ? – Caius Jard Jun 24 '20 at 06:32
  • @CaiusJard Apologies... this collection of datatables will be added to a dataset for transfer to the calling API, that dataset could have other datatables that do not have that first annoying column. – Slagmoth Jun 24 '20 at 11:48
  • You could remove the column at source, but I think it's going to be a lot easier to take it out later.. – Caius Jard Jun 24 '20 at 12:06
  • @CaiusJard Ideally I would have the sprocs return datasets already broken and grouped appropriately to avoid this but I have to go with the assumption that that will not be an option. – Slagmoth Jun 24 '20 at 12:22
  • Oh, i didn't mean that far back towards the source - I meant "in the select, afte rthe LINQ group by" but you're correct; no point dragging a million transactions over a network just so a low powered client can add them up to produce the current balance; the multi-core/gb/disk/everything DB server should do it and send one number over the network... – Caius Jard Jun 24 '20 at 13:23

2 Answers2

1

dr.ItemArray.Skip(1).ToArray();

I suspect this (whatever dr is - it's not mentioned in your question) was used in the wrong place, and you ended up removing the column you wanted to group on, before the grouping was done

query those using Skip(1)

Skipping a datarow won't help you skip a column though - it just drops the first entire row of data


You can just drop the column after you prepare rather than getting into anything convoluted:

if (thing.Columns[0].ColumnName.Contains("Sheet"))
{
    var test = from t in thing.AsEnumerable()
        group t by t.ItemArray[0]
        into g
        select g.ToList().CopyToDataTable();

    foreach(var dt in test){
        dt.Columns.RemoveAt(0);
        endDataSet.Tables.Add(dt);
    }
}

Or you can manipulate the result dataset:

    //referencing known table names
    var dtToStrip = new[] { "ATableName", "BTableName" }
    foreach(DataTable dt in endDataSet.Tables)
      if(dtToStrip.Contains(dt.TableName))
        dt.Columns.RemoveAt(0);

    //or removing any column named sheet from any table in the set
    foreach(DataTable dt in endDataSet.Tables)
      if(dt.Columns.Count > 0 && dt.Columns[0].ColumName.Contains("Sheet"))
        dt.Columns.RemoveAt(0);

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
  • I did try looping through the datatables but for some reason when I reference "test" to place it into the end dataset I still have those columns. – Slagmoth Jun 24 '20 at 12:06
  • I made an edit - the explanation is on a comment on your answer – Caius Jard Jun 24 '20 at 13:34
0

I believe I am a victim of the deferred execution in this case.

Originally, I was doing the following:

if (thing.Columns[0].ColumnName.Contains("Sheet"))
{
    var test = from t in thing.AsEnumerable()
        group t by t.ItemArray[0]
        into g
        select g.ToList().CopyToDataTable();

    foreach (var dt in test)
    { 
        dt.Columns.RemoveAt(0);
    }

    ds.Tables.AddRange(test.ToArray());
}

Which I mistakenly assumed was executing the query upon doing the first .ToList(). But it didn't get executed until I did the .ToArray() when adding it to the DataSet at the end. Which, if I understand this correctly, means that the executions within my foreach were all for naught because they were independent executions.

If I instead move that .ToArray() to the LINQ query first it seems to work:

if (thing.Columns[0].ColumnName.Contains("Sheet"))
{
    var test = (from t in thing.AsEnumerable()
        group t by t.ItemArray[0]
        into g
        select g.ToList().CopyToDataTable()).ToArray();

    foreach (var dt in test)
    { 
        dt.Columns.RemoveAt(0);
    }

    ds.Tables.AddRange(test);
}

Thanks for the help.

Please let me know if my conclusions are incorrect or if I missed some logic as some point that would make this more helpful for other devs having a similar issue.

Slagmoth
  • 173
  • 1
  • 10
  • It's not quite a deferred execution thing; the LINQ query is executed when you enumerate it in the for loop, it's just that it is/was *executed again* when you did ds.Tables.AddRange. If you put the `ds.Tables.Add` call *into* the for loop, it "works as you would expect" without the ToArray(). test is a *query*; you can either materialize it with ToArray (which is a bit of a wasted step) or you can use the results as you enumerate. Without ToArray and with a for+addrange, you make a query, run it, enumerate it in a for, throw the result away, and then AddRange does its own enumeration/run.. – Caius Jard Jun 24 '20 at 13:35
  • @CaiusJard Can you elaborate on why it "is a bit of a wasted step" please? I have tried to ensure I leverage deferred execution where necessary but if I only require the collection result once I was under the impression that it would be "cheaper" to get what I need then manipulate the object collection at that point. I realize in both cases that sort of happens but I guess I don't understand what happens under the hood with the differences to which you allude. – Slagmoth Jun 24 '20 at 14:18
  • 1
    This isn't deferred execution per se; LINQ is deferred execution, and whether the first thing you do after you formulate a query is ToArray() it or foreach it, you're executing it and enumerating so nothing was *deferred* (any longer than it would otherwise have been). I claim it's a wasted step to have LINQ prepare an array from an enumerable and then enumerate the array, simply because it's possible to enumerate an enumerable, so the process of creating an array *must* be a wasted step.See also https://stackoverflow.com/questions/1105990/is-it-better-to-call-tolist-or-toarray-in-linq-queries – Caius Jard Jun 24 '20 at 14:50
  • (I should have clarified "is a wasted step *in this case*" - there will be countless other cases where calling ToXxx on a linq query to run the query and materialize the results into a collection and keeping the collection is what you want to do.. You could also say that foreaching and putting the tables into a dataset is the same thing, and you'd be right; we do it in this case because we want a datatset and there isn't a ToDataSet() that turns an `IEnumerable` into a DataSet (a "datatablecollection" per se) like there is a ToArray that turns `IEnumerable` into `string[]`) – Caius Jard Jun 24 '20 at 15:04