0

I am part of a study where subjects are given input from 4 categories (1, 2, 3, 4) and each category has 4 words (a, b, c, d). The order of the words needs to be randomized for each subject, but consecutive words should be from different categories.

I am fairly new to coding and have been working on an algorithm that can do this, but it is getting pretty long and complicated, and I feel there should be a simpler way to do it that is less prone to errors. Any help would be much appreciated!

user513951
  • 12,445
  • 7
  • 65
  • 82
Steven
  • 23
  • 1
  • 1
    Please show your work (relevant code, inputs, expected vs actual outputs, errors, etc). As written, it's a set of requirements, with the expectation that the Community solve/implement it for you. You mentioned you're "working on an algorithm" - great! This is what you should show (the code implementation of it) – David Makogon Mar 15 '23 at 16:57
  • 2
    FYI - helpful reading: [How do I ask and answer homework questions?](https://meta.stackoverflow.com/q/334822) (even if this isn't a homework question) – David Makogon Mar 15 '23 at 16:59
  • 1
    Thanks for the info. I've looked to stackoverflow for help several times, but this is my first time posting. I didn't realize that it was required to include my work. I just figured that what I had was so far outside of a good solution that it wouldn't be beneficial to include, and I was hoping for a different direction to try. Next time I'll be sure to include it. – Steven Mar 16 '23 at 05:17

3 Answers3

2

Here is a "brute force" (computationally inefficient, but simple) method.

  1. Produce a randomized list.
  2. Check whether the list meets the criteria.
  3. Repeat until you have collected as many valid lists as you need.
    source = ["1a","1b","1c","1d","2a","2b","2c","2d","3a","3b","3c","3d","4a","4b","4c","4d"]
    valid = []
    num_valid = 100
    
    until valid.length >= num_valid do
      candidate = source.shuffle
    
      if candidate.each_cons(2).all? {|a,b| a[0] != b[0] }
        valid |= [candidate]
      end
    end

When the example code above terminates, valid will contain 100 different valid lists.

user513951
  • 12,445
  • 7
  • 65
  • 82
  • The OP hasn't shown any work. Best to help the OP by suggesting ways to improve the question, not to write code for them. – David Makogon Mar 15 '23 at 16:57
  • 1
    @DavidMakogon, I disagree. The question is firstly (i.e., before laying down code) to devise an algorithm that produces a random ordering that has a probability of being selected equal to 1/N, where N is the total number of orderings having the desired property. This answer produces a random ordering. Any other algorithm that does so will be, to say the least, non-trivial. How can a person show work towards discovering a complex algorithm? – Cary Swoveland Mar 15 '23 at 23:58
  • If there are n elements to randomize there are n! possible randomizations. Suppose m of these randomizations have the desired property. It can then be shown that, on average, approximately n!/m randomizations would be performed between each pair of randomizations having the desired property. That could be quite a large number. – Cary Swoveland Mar 16 '23 at 02:44
  • 1
    Woah, thanks! I feel so stupid I never even considered there might be a shuffle function for arrays! I've never seen each_cons, and the documentation was a little confusing to me at first, but now I think I understand what it is doing. Finally, I couldn't really find documentation for |=, but playing around with it in irb, does it only add unique candidate arrays to the valid array? Kind of like adding it and then calling .unique? This is all super helpful! This is beyond where I am in my learning journey, and I don't think I could have come to it on my own, but studying it I've learned a lot! – Steven Mar 16 '23 at 05:47
  • @Steven You're pretty much right about `|=`; [here is the explanation](https://stackoverflow.com/a/19797217/513951). In `each_cons(2)`, "cons" is short for "consecutive," so it means "loop over every two consecutive elements, e.g. a&b,b&c,c&d..." and is obviously useful for making sure each element is different from the last. I'm glad you followed up to investigate some of these tricks. That's one of the magical things about Ruby—you see something that makes you say, "Whoa! I can't believe you can do that!" just about every day, and then you gradually fill up your tool belt. – user513951 Mar 16 '23 at 12:51
1

Decomposing the problem (in one way; there are others):

  • Need a list of non-repeating categories
    • TBD: What about [a, b, a, ...]
      • E.g., does each group of words.length combos need a word from each category
    • TBD: Do cats need to be the same order for each group of words
      • E.g., which:
        • [c, b, d, a, c, b, d, a, ...] (same cats each words group) or
        • [c, b, d, a, b, a, c, d, ...] (different cats)
    • Minor tweaks either way
  • That list is length cats.length * words.length (num of combos)

Verbosely-but-clear-ishly: it's a flat-map of category shuffles, one per words.length, with re-shuffles (if needed) until the shuffle's first entry is different from the previous shuffle's last entry.

Take that in chunks of words.length and associate each category with a word from a shuffled word list.

Naive implementation, probably, but reasonably clear, and a dozen or two lines of code without trying to be clever (which I definitely didn't :rofl:). In other words: it doesn't need to be super-complicated to meet the requirements--might be worth trying to think about it a different way.

Dave Newton
  • 158,873
  • 26
  • 254
  • 302
  • I like this because it 1) has a nice solution to the core "puzzle" and 2) follows the "how to answer homework questions" guidelines, by pointing in the right direction without handing over the code (which I'm guilty of in my answer). – user513951 Mar 15 '23 at 18:24
  • There don't need to groups containing one word from each of the categories. I need all 16 words dumped into a hat and drawn out one at a time. But if its integer component is the same as the previous one, it should be put back and another one drawn out. I guess the biggest worry is that at some point there are only words with the same integer left in the hat. It might work to do groups of 4, but I'll have to check with my colleague to see if that is acceptable, becuase in these sorts of studies it is really important to show that outcome isn't affected by any anticipation by the subject. – Steven Mar 16 '23 at 01:29
1

Here is a method that is only slightly different than @user513951's, but may possibly be more efficient with the array is large. Like the earlier method it loops until a random selection has the required property. Both methods produce a statistically random sample (subject to limitations in the production of truly random values in software).

The earlier method extracts the category of each of two string in the block { |a,b| a[0] != b[0] }. This assumes, however, that there are no more than 9 categories. If the category may be comprised of two or more digits the digits would have to be stripped off the front of the string. Probably it would be more efficient to have a pre-processing step that makes a and b in the aforementioned block two-element arrays (which would require some tidying up at the end). That would be a simple change to the earlier method.

If we are given

a = ["1a", "1b", "1c", "1d", "20a", "20b", "20c", "20d",
     "3a", "3b", "3c", "3d", "41a", "41b", "41c", "41d"]

we may start by computing1

arr = a.map { |s| s.split(/(?<=\d)(?=\D)/) }
  #=> [["1", "a"], ["1", "b"], ["1", "c"], ["1", "d"],
  #    ["20", "a"], ["20", "b"], ["20", "c"], ["20", "d"],
  #    ["3", "a"], ["3", "b"], ["3", "c"], ["3", "d"],
  #    ["41", "a"], ["41", "b"], ["41", "c"], ["41", "d"]]

When compared to the earlier method the only change I am proposing is to first obtain a random sequence of categories that satisfies the desired property. Once found I will convert that to a return value that remains an unbiased sample selection.

In effect I am simplifying the block { |a,b| a[0] != b[0] } to { |x,y| x != y }, where x and y are categories. The improvement in efficiency could be significant for very large arrays.


First create a hash that maps categories to an array of their words.

h = arr.each_with_object(Hash.new { |h,k| h[k] = [] }) do |(cat,word),h|
  h[cat] << word
end
  #=> {"1"=>["a", "b", "c", "d"], "20"=>["a", "b", "c", "d"],
  #    "3"=>["a", "b", "c", "d"], "41"=>["a", "b", "c", "d"]}

Next, shuffle the elements of each value to preserve randomness later.

h.transform_values!(&:shuffle)
  #=> {"1"=>["b", "d", "a", "c"], "20"=>["b", "c", "a", "d"],
  #    "3"=>["d", "b", "c", "a"], "41"=>["a", "d", "c", "b"]}

Then select the keys replicated by the sizes of their values (which need not be all the same for all keys).

keys = h.each_with_object([]) { |(k,v),a| a.concat([k]*v.size) }
  #=> ["1", "1", "1", "1", "20", "20", "20", "20",
  #    "3", "3", "3", "3", "41", "41", "41", "41"]

Now find a sequence of categories that satisfies the desired property.

loop do
  n += 1
  keys.shuffle!
  break keys if keys.each_cons(2).all? { |x,y| x != y }
end
  #=> ["1", "41", "3", "20", "41", "20", "1", "20",
       "3", "41", "1", "3", "1", "20", "41", "3"]

(Upon executing this several times it always took under 70 iterations to find a valid sequence.)

Lastly, append the randomized words for each category to produce a random array that satisfies the required property.

keys.map { |k| "#{k}#{h[k].shift}" }
  #=> ["1b", "41a", "3d", "20b", "41d", "20c", "1d", "20a",
  #    "3b", "41c", "1a", "3c", "1c", "20d", "41b", "3a"]

1. (?<=\d) is a positive lookbehind that asserts that the match is preceded by a digit. (?=\D) is a positive lookahead that asserts that the match is followed by a character other than a digit. This regular expression therefore matches the zero-width location between the last digit in the string and the first non-digit.

Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100