1

I need to generate an m-file which generates large lists of random "pseudo-words" that follow certain rules.

The script give the permissible letters as in sets, with the probability of them appearing in a word. For this particular application, a word can have from 2 to 4 "syllables" which may be composed of one member from set C and one from V, or one from C, one from V and another from V again.

The following code is able to produce one word at a time, but I'd like to be able to produce say 50 or 100 at a time.

What I've worked out so far is below:

clc
word = [];
wlist = {};
C = ['KGBNSLMTVx_']; prob_C = [0.13, 0.12, 0.11, 0.10, 0.107, 0.066, 
0.09, 0.066,0.066,     0.065, 0.06];
C2 = ['KLNT']; prob_C2 = [0.2575,0.2525,0.2475,0.2425];
V = ['AIUE']; prob_V = [0.275,0.265,0.245,0.24];
for m = 1:randint(1,1,[2 4])
add_C2 = mod(randint(1,1,[1,100]),6);
if add_C2 == 5
    syl = [randsample(C,1,true,prob_C) randsample(V,1,true,prob_V)
    randsample(C2,1,true,prob_C2)]; 
else
    syl = [randsample(C,1,true,prob_C) randsample(V,1,true,prob_V)];
end
word = [word syl];
end
new = char(word);
wlist = {wlist{:}, new};
disp(wlist')

Assistance would be appreciated.

arne.b
  • 4,212
  • 2
  • 25
  • 44
  • Can you give some example 'acceptable' words to make it clear? – petrichor Nov 14 '11 at 10:00
  • words like; 'KITLANSU' , 'SABEVABA' , 'SILU' , 'TULKATESI' , 'GESA' , 'MUNAxAT' 'KI_ATVA' , 'TIGAGE' are acceptable. – Inspector Bumstead Nov 14 '11 at 10:27
  • @InspectorBumstead: you might be interested in using probabilistic models like Markov chains. Here is an example I previously wrote that generates random words: http://stackoverflow.com/questions/2650982/anagram-solver-based-on-statistics-rather-than-a-dictionary-table/3341080#3341080 – Amro Nov 14 '11 at 19:44

1 Answers1

1

The following code generates 100 acceptable words for your problem.

clc, clear
nWords = 100;

wList = {};

C = 'KGBNSLMTVx_'; 
probC = [0.13, 0.12, 0.11, 0.10, 0.107, 0.066, 0.09, 0.066,0.066, 0.065, 0.06];

C2 = 'KLNT'; 
probC2 = [0.2575, 0.2525, 0.2475, 0.2425];

V = 'AIUE'; 
probV = [0.275,0.265,0.245,0.24];

probAddC2 = 0.16;

for i=1:nWords
    word = [];
    nSyl = randi([2 4]);
    for j = 1:nSyl
        syl = strcat(randsample(C,1,true,probC), randsample(V,1,true,probV));
        if rand < probAddC2
           syl = strcat(syl, randsample(C2,1,true,probC2));     
        end
        word = strcat(word, syl);
    end
    wList{end+1} = word;
end
wList'

Note: I didn't understand why you generate a random integer in [1,100], take the mod and compare with 5. There are 16 numbers whose mod is 5 in [1,100], so the ratio is 0.16.

petrichor
  • 6,459
  • 4
  • 36
  • 48