1

Let's say that a 4th column of a a 7 column Matrix contains the following figures

[61 52 67 58 62 69 51 57 66 68 67 55 69 54 57 64 53 etc....]'

MAIN QUESTION: Is there a way that I can insert a specified value (e.g., 68) in random places but be able to control the number of times that this number is inserted so that it only appears a specified percentage of times in the context of the whole column. This would be such that newly inserted values are subject to deletion before any values that were in the original column are deleted? (So in the given example, if we wanted 30% of the values in column 4 to be '68' then the code I'm looking for would add as many additional 68s as are necessary for this to be the case (or in the instance where to begin with there are already over 30% occurrences of 68, the code will instead randomly remove as many rows as it needs to in order to do this).

SECOND QUESTION:

When I do insert new rows via the added values in column 4 (here '68') then I'll need values for the other 6 columns of each added row - how do I ensure that some other values are added to these columns? I'll replace these with relevant values later, but obviously it won't let me add a row to the matrix with any empty values... ?

Sam Leak
  • 41
  • 5
  • quite a lot of these details are not really relevant to the question. Could you simplify it as much as possible, and give an example of input and output. It might also be easier to explain it using letters rather than numbers, e.g. `[casfsxxxdxgdsgxx]`, where `x` is the thing of interest. – dan-man May 03 '16 at 19:53
  • @dan-man - edited the question - hopefully simplified enough now? Thanks :) – Sam Leak May 04 '16 at 03:52
  • @excaza - I think that I could do with having that explained at 'programming for dummies level' if that's cool? Thanks :) – Sam Leak May 04 '16 at 03:52

1 Answers1

2

Finding the number of values you need to add/remove is pretty trivial.

For example: Given a vector A, you want to add n_new values to A to have a desired percentage, DP, of 30%. So you start with this equation:

start

And solve for the number of values to add:

finish

Once you have your n_new value, you know how many occurrences of val you need to add to your array. You can throw some in either end of A (or both) and then sort the resulting array. You can utilize randperm to generate a randomized vector of indices and use those to create a randomly "sorted" array. See also: MATLAB's Matrix Indexing documentation, specifically accessing multiple elements.

Removing values uses pretty much the same logic. If your n_new value is negative, it means you need to remove n_new occurrences of val to get your DP.

In MATLAB this gives us something like the following:

% Sample Vector
A = [61 52 67 58 62 69 51 57 66 68 67 55 69 54 57 64 53];

% Criteria
DP = 0.4;
val = 57;

% Find count of val in A
n_val = length(find(A==val));  % Ignore floating point issues for brevity

% Find number of new values to add/remove to get to DP
n_new = (n_val - DP*length(A))/(DP - 1);
n_new = fix(n_new);  % Need to round to the nearest integer in some direction

if n_new > 0
    % Need to add values
    % Create new vector, append appropriate number of values
    B = horzcat(A, repmat(val, 1, n_new));
    % Randomly sort
    newidx = randperm(length(B));  % Generate a random permutation of our indices
    B = B(newidx);
elseif n_new < 0;
    B = A;  % Copy vector
    % Need to remove values
    val_idx = find(B == val);  % Ignore floating point issues for brevity
    remidx = val_idx(randperm(length(val_idx), abs(n_new)));  % Generate n_new number of random indices
    B(remidx) = [];  % Delete values
end

% Test
p = length(find(B==val))/length(B);

Which gives us the following:

B =

    57    51    52    57    57    69    57    57    55    67    53    57    64    69    57    57    54    57    61    58    57    66    67    68    62

p =

    0.4000

And to test removal:

% Sample Vector
A = [57 51 52 57 57 69 57 57 55 67 53 57 64 69 57 57 54 57 61 58 57 66 67 68 62];

% Criteria
DP = 0.10;
val = 57;

And we get:

B =

    57    51    52    69    57    55    67    53    64    69    54    61    58    66    67    68    62

p =

    0.1176

I'll also add the obligatory caveat for comparing two floats for equality if you are not working with MATLAB's integer data types. In the find calls you will want to incorporate a tolerance to account for floating point issues. For more information see: What Every Computer Scientist Should Know About Floating-Point Arithmetic and the more MATLAB-specific Why is 24.0000 not equal to 24.0000 in MATLAB?

Community
  • 1
  • 1
sco1
  • 12,154
  • 5
  • 26
  • 48
  • Amazing - thank you! I'm out and about right now, but I'll give this a look later. Much appreciated! – Sam Leak May 05 '16 at 03:26