9

Let's say I have a DataTable with ~50 rows (GetDataTable() on a list in SharePoint). I want to keep 10 random rows and forget about the rest. How can I achieve this?

Thanks in advance.

user997685
  • 543
  • 2
  • 7
  • 14

3 Answers3

8

You can use a Fisher/Yates shuffle (implementation by Skeet) on the collection of rows on the DataTable, then select the first 10.

var random10 = dataTable.Rows.OfType<DataRow>().Shuffle(new Random()).Take(10);
Community
  • 1
  • 1
tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • I'll also note that you could adapt the shuffle to do only the first *n* swaps and return those as the random collection, in effect doing the random selection with one call. You'd have to change the algorithm to order from beginning to end (or select the last *n* elements). – tvanfosson Oct 18 '11 at 03:35
  • hi what version of .net does this require? i can't seem to extend this Shuffle method onto DataRowCollection. – Funky Dude Nov 19 '12 at 21:26
  • @FunkyDude - it should work with C# 3.0 starting with .NET 3.5. – tvanfosson Nov 19 '12 at 21:38
  • I am using 3.5. I am new to extension method, so i might not be doing it right. can you tell me if there is anything wrong with it? http://pastebin.com/FAntG1Yu – Funky Dude Nov 19 '12 at 21:43
  • @FunkyDude - looks ok at first glance. what's the problem? can you just not see it or does it not compile? if you can't see it, it's probably because you haven't included a `using CustomExtensions;` in the class where you are attempting to use it. – tvanfosson Nov 19 '12 at 22:05
  • I don't see the shuffle method. i have using CustomExtension above my class. The implementation looks like this http://pastebin.com/kZrDbzb3 – Funky Dude Nov 19 '12 at 22:14
  • alright figured it out. DataRowCollection isn't an IEnumerable. have to convert it first. – Funky Dude Nov 20 '12 at 00:31
  • @FunkyDude - it is `IEnumerable` but it isn't `IEnumerable`. Using `OfType` ought to fix that. – tvanfosson Nov 20 '12 at 02:11
  • I think he means so that he can put the rows back into the DataTable – haknick Jun 11 '13 at 15:53
1

The DataTable contain a property Rows that is a DataRowCollection. You can access each of those rows by using index.

So you can get random number with Random and get the data from the myTable.Rows[myRandomIndex].

Random random = new Random();
int randomNumber = random.Next(0, 50);
Patrick Desjardins
  • 136,852
  • 88
  • 292
  • 341
  • Note that this method requires that you keep track of which elements you've selected so that you don't reselect them. In general, this isn't feasible, especially when you are selecting a large fraction of element from the set randomly. The shuffle algorithm works by providing a random ordering in O(N), selecting swaps from a reducing set to avoid repeats, then taking the first K items. You could adapt it to perform only the first K swaps and return those elements as well to improve performance. – tvanfosson Oct 18 '11 at 03:46
  • You could just get a random number and take the rows, do an other random and get the other row. If you do not want to have them repeated, you keep track of the index taken. This is feasible... – Patrick Desjardins Oct 18 '11 at 12:49
  • For a small number, yes. But as the number of selected rows grows you have more and more collisions, resulting in more selections being required. In fact, because the selections are random you're not even guaranteed that you won't eventually get to the point where you continually select a row that's already been selected. n fact, you have probabilistic termination, not guaranteed termination, but the probability is much higher with fewer elements. cf. http://dilbert.com/strips/comic/2001-10-25/ – tvanfosson Oct 18 '11 at 13:29
  • The guy has about 50 elements... and just require 10 rows. Randomizing 10 unique number between 0 to 49 with a loop will create collision but not enough to have any performance issue. Anyways, no algorithm of randomness will provide 0 collision. But, thx for your information. Have a nice day. – Patrick Desjardins Oct 18 '11 at 14:33
  • Fisher-Yates avoids collisions by reducing the set from which the random element is chosen so it it guaranteed to be O(n) and deterministic, not probabilisitic. That is you choose a random element from the whole array, swap it into the last position, then reduce by one the number of elements of the array that you consider. Eventually, you end up with the array randomly sorted then choose the first N elements. No collisions, no reselections. FWIW, my comment was just that it wouldn't scale and isn't feasible in the general case. For this particular case it may be ok. – tvanfosson Oct 18 '11 at 15:46
  • I read your post and the one of @John Skeet and I prefer your way :) Thx for this algo, didn't know about it. – Patrick Desjardins Oct 18 '11 at 17:17
1

try this:

    Random r = new Random();
    while (dt.Rows.Count > 10)
    {
        int j = r.Next(0, dt.Rows.Count);
        dt.Rows.RemoveAt(j);
    }
ojlovecd
  • 4,812
  • 1
  • 20
  • 22
  • Note this is O(N^2-NK) where K is the number of random elements selected and N is the size of the initial array because you have to collapse the array when the element is removed. Not a big deal when N=50, but if you were selecting 100 from 1,000,000 it would make a huge difference. Fisher-Yates (Durstenfeldt's version), which Jon Skeet's implementation follows, is O(N) and could be adapted to be O(K) for this particular problem. – tvanfosson Oct 18 '11 at 03:41