0

I tried to find any solution on stackoverflow, but i didn't found it.

I have a DataTable, obtained from excel sheet. This is not clear table, it contain many sub-tables (starts with headers) and other necessary information above of them (which may contain empty rows). For example:

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................   
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   IN     |

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................    
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   OUT    |

I want to split the data table into multiple data tables that begin with many rows of necessary information, then the table itself and ends with empty row. As a result, I have to get DataTable[] data type.

I tried to obtain indexes of the last row of the sections of the data table (if DataRow contain "in" or "out" and next index contain empty row), but i don't know if is a good code and a good solution for further split a data table:

var indexes = dataTable.AsEnumerable()
    .Where(x => x.ItemArray.All(rowCell => rowCell.ToString() == string.Empty))
    .Where(
       x => dataTable.Rows[dataTable.Rows.IndexOf(x) - 1].ItemArray.Any(
             item => Regex.Match(
                   item.ToString(),
                   "^in|out$",
                   RegexOptions.IgnoreCase).Success))
    .Select(x => dataTable.Rows.IndexOf(x)).ToArray();

I have two Where linq conditions for check whether exist empty row after row that contain "in" or "out" words.

How can I split the DataTable by these indexes? I want to find a similar Linq Expression for this purpose. As a result, I have to get the following tables:

Table #1

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................   
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   IN     |

Table #2

Line1 : Other data...
[empty line]
Line2: Other data...
[empty line]
...................    
ColA   |   ColB   |   Type   |
------------------------------
AAA    |   BBB    |   IN     |
AAA    |   BBB    |   OUT    |
AAA    |   BBB    |   OUT    |

I know how to process this data further, but I don't know how to split the data table.

2 Answers2

0
var selectedIndexes = from row in dataTable.AsEnumerable()
                          where (row.Text.Equals("IN") || row.Text.Equals("OUT"))
                                 && dataTable.Rows[dataTable.Rows.IndexOf(row) - 1]
                          select dataTable.Rows.IndexOf(row);

Might do the trick, although this is a non-tested code.

EDIT:

Since your updated question tells me you are dealing with csv files I am strongly suggesting CSVHelper Nuget for you.

The documentation can be found here:

https://joshclose.github.io/CsvHelper/

MwBakker
  • 498
  • 5
  • 18
  • I want to split DataTable with these indexes into array that contain multiple DataTable objects. The linq expression already was implemented for get these indexes. – Victor Pogor Sep 19 '18 at 13:51
  • I am sorry, then I am not sure what you mean to do. Maybe somebody else gets the implementation. – MwBakker Sep 19 '18 at 14:00
  • For example, indexes = new [] {243}. First DataTable starts from 0 to 243 row index, and last DataTable - starts from 244 to end. I think i should use _Skip_ and _Take_ but i don't know how to do it. – Victor Pogor Sep 19 '18 at 14:05
  • btw I still suggest using the above syntax to save some performance when dealing with bigger ammount of records. – MwBakker Sep 19 '18 at 14:25
  • So if I get it right you want all the rows except those who are "multiple lined" or contain no value? – MwBakker Sep 19 '18 at 14:32
  • I updated the question. Maybe it will be a bit clearer. – Victor Pogor Sep 19 '18 at 16:03
  • I see. There are easier and more efficient roads for that like CSVHelper libraries. These are especially designed to restract data coming from csv (excel) files. Knowledge on working with this library will make things easier and quicker on later projects in your life. You can read about that on the following link https://joshclose.github.io/CsvHelper/ – MwBakker Sep 19 '18 at 16:22
0

Finally, I found the solution. I was inspired by the following answer of the question: Split datatable into multiple fixed sized tables

I have changed a bit the indexes identification code, according to the MwBakker user's suggestion.

var indexes = dataTable.AsEnumerable()
      .Where(x => x.ItemArray.All(rowCell => rowCell.ToString() == string.Empty))
      .Where(
         x => dataTable.Rows[dataTable.Rows.IndexOf(x) - 1].ItemArray.Any(
           item => string.Equals(item.ToString(), "In")
                   || string.Equals(item.ToString(), "Out")))
      .Select(x => dataTable.Rows.IndexOf(x)).ToArray();

I created the extension method which split the data table rows by indexes:

public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> enumerable, int[] indexes)
    {
        // Sort array.
        indexes = indexes.OrderBy(x => x).ToArray();
        var itemsReturned = 0;
        var list = enumerable.ToList();
        var count = list.Count;
        short i = 0;

        while (itemsReturned < count)
        {
            int currentChunkSize = i < indexes.Length
                                       ? Math.Min(indexes[i] - itemsReturned, count - itemsReturned)
                                       : count - itemsReturned;
            yield return list.GetRange(itemsReturned, currentChunkSize);
            itemsReturned += currentChunkSize;
            i++;
        }
    }

Next, i made a foreach loop:

var dataTables = dataTable.AsEnumerable().Split(indexes)
    .Select(rows => rows.CopyToDataTable());    

foreach (var table in dataTables)
{
    // ...
}