0

I have a table as below:

  1 2 3 4 5 6 . . . .
1 1 0 0 0 1 0 . . . .
2 0 0 1 1 1 1 . . . .
3 0 1 0 0 0 1 . . . .
4 1 0 0 0 0 0 . . . .
5 0 0 1 0 1 0 . . . .
. . . . . . . . . . .
. . .
. .
.

1,2,.... are title of rows and columns. I need to index into table this means that : one array (vector) that index into row1 that is contain 1 (=column 1 because in table cell(1,1) is true). another array that index into row2 that is contain 3,4,5,6 (because cells (2,3),(2,4),(2,5),(2,6) are true) and etc ...

I read Compact MATLAB matrix indexing notation and Use a vector as an index to a matrix with accuracy but I can not write code for it work.

EBH
  • 10,350
  • 3
  • 34
  • 59
Eli
  • 83
  • 8

2 Answers2

1

Since each of the result array is in different size, you could use cell array.
First your sample data is not really a table, so let's make an arbitrary one:

T = table({'a' 'a' 'a' 'b' 'b'}.',{'X' 'Y' 'Z' 'X' 'Z'}.',(1:5).',...
    'VariableNames',{'UserId','productId','Rating'});

Next, we will convert all the 'key' columns to categorical arrays:

T.UserId = categorical(T.UserId);
T.productId = categorical(T.productId);

Then we use this categorical arrays to cross-tabulate the table:

cross_T = crosstab(T.UserId,T.productId)

And now we look for all the 1 in the new matrix:

[r,c] = find(cross_T);

And use an arrayfun to collect them by row:

% a function to return all 1 in row
row = @(x) c(r==x).';
% preform on all rows
r = arrayfun(row,1:size(cross_T,1),'UniformOutput',false).';

So we get as output the cell array r:

r = 
    [1x3 double]
    [1x2 double]

And to look for the data on specific user we write:

>> r{2}
ans =
     1     3

Is you want this to be more readable you can convert into structure array:

s = cell2struct(r,categories(T.UserId))

So then the output for s will be:

s = 
    a: [1 2 3]
    b: [1 3]
EBH
  • 10,350
  • 3
  • 34
  • 59
  • thank you.another question: if I have a table(column1:UserId , column2: productId ,column3:Rating and etc) that one row means any person give a rating (1,2,3,4,5) to any Product. for example:<< A X 2, A Y 5, B X 1, B X 2 >>> are four for in this table. I want index into table for A include X and Y; for B inculde X ,.... can you help me? – Eli Aug 02 '16 at 10:10
  • @Eli please, if you have another question, post it as another question :) This way it will be much more easy to understand and answer it in its context – EBH Aug 02 '16 at 10:44
  • Specifically, I understand how your data looks, but not what you want in the result. – EBH Aug 02 '16 at 10:55
  • excuse me. another persons give me negative score and Now Unfortunately I can not let new posts. sorry... – Eli Aug 02 '16 at 10:57
  • Which part do not you understand to explain again? – Eli Aug 02 '16 at 10:58
  • _"I want index into table for A include X and Y; for B inculde X"_ how do you choose which rows to include for each user? – EBH Aug 02 '16 at 11:01
  • I would like for all existing users, determine what products have rated? – Eli Aug 02 '16 at 11:05
  • for this line ,matlab give error. categories(T.productId) tbl = array2table(crosstab(T.UserId,T.productId)); – Eli Aug 02 '16 at 11:51
  • Thank you very much but I've done this with the code below: [uIDs, ~, UserIds] = unique(T.UserId, 'stable'); [upIDs, ~, productIds] = unique(T.productId, 'stable'); matrix = false(numel(uIDs), numel(upIDs)); matrix(sub2ind(size(matrix), UserIds, productIds)) = true; ...... My problem is that the number of zeros and ones in this table are huge , As far as MATLAB will not be able to build this table.I want to keep only the information that we need and do not need to zeros, Such as information on 'r' for my first question with This difference that tags its lines are 'UserId' . – Eli Aug 02 '16 at 12:07
  • I will a new post later but you know any information for this problem? – Eli Aug 02 '16 at 12:40
  • It's Ok but finally table that has 0,1 for my program is very huge. In summary ,I have a table (UserId,ProductId,Rating, ...) and finally I need to any thing as 'r' in your answer in my first question that tags its lines (tags of lines r) are 'UserId' – Eli Aug 02 '16 at 12:46
  • Have read sparse. But its input is a matrix which, as I said this matrix is very large MATLAB can not make it. – Eli Aug 02 '16 at 12:50
  • yes YOU are Right. whenever a site that will let me do it. – Eli Aug 02 '16 at 13:08
  • @Eli First instead of all your calls to `unique`, simply write: `crosstab(T.UserId,T.productId)`, and you will get the same result as a matrix. Converting `T.UserId` and `T.productId` to categorical arrays with `categorical` only makes it faster. – EBH Aug 02 '16 at 20:08
  • @Eli see my last edit, I think it's solves the problem – EBH Aug 02 '16 at 20:41
  • in the middle of this solution is used cross_T. As's I've already said that this is a problem That MATLAB can not do for me to make because the matrix is very large. Matlab give me this error : Error using categorical This operation would create a categorical array with more than 65534 categories. Is there another solution? I would like to apologize to you – Eli Aug 03 '16 at 10:24
  • @Eli first, read about [structures](http://www.mathworks.com/help/matlab/structures.html). In general you call `s.a` to get the values of field `a` in `s` (`XYZ`), and `s.a(2)` to get the second element on that field (`Y`). – EBH Aug 03 '16 at 10:27
  • In the next lines we need to cross_T for calculate . How do I remove it? – Eli Aug 03 '16 at 10:32
  • NO. I'm so sorry . I explain again. The solution that you give me , Cross_T created. in my program Table T is very huge (has 5.8 million rows) so Cross_T is huge too ( About 2 millions row and 1 million column) . so MATLAB can not create Cross_T because it is out of memory. Otherwise, your solution is correct, but I'm not applicable for the program. So I'm looking the other way .... – Eli Aug 03 '16 at 10:46
  • one question. How do I vote to You? I do not have much know with the site – Eli Aug 03 '16 at 10:47
  • @Eli I'll get you to about this problem, I think the solution is slicing the data, but you'll have to test it because I don't have such large data set. you might want to look [here](http://www.mathworks.com/help/matlab/import_export/getting-started-with-mapreduce.html?searchHighlight=map%20reduced) – EBH Aug 03 '16 at 11:02
  • Can I put the data (for example) in 5 files that any file has 1 million rows and run this code? – Eli Aug 03 '16 at 11:06
  • @Eli Another edit to the answer, now it works on one userID each time. Hopefully, there aren't so many ratings per user... and if there are, you can change it to work by productID, but the result will need another manipulation. – EBH Aug 03 '16 at 11:30
  • yes . this solution is my answer. but if productId is 'FG' i.e is composed of several characters , considers it several product. This is what to do about it? – Eli Aug 03 '16 at 12:22
  • what do you mean for this: Hopefully, there aren't so many ratings per user... and if there are, you can change it to work by productID, but the result will need another manipulation .... any user rating for some product and There are many cases that gives rating for many products – Eli Aug 03 '16 at 12:27
  • First, this discussion in comments is way too long. Please edit your original post to address all the issues you raised in the comments (than you can delete them). Second, each "slice" that processed through the loop is one userID, so if he has a lot of ratings the slice is big, but _Hopefully_ no single user has so many ratings that it can't be processed in a single slice. – EBH Aug 03 '16 at 12:36
  • @Eli See my last edit - if you don't convert the productID to character array, then it can be anything. – EBH Aug 03 '16 at 13:44
  • hi.This code works and I thank you very much.I suggest that you put the first response to this post. I create another post and you put The last response to this post.In this way, others will use two posts and Also be to your advantage. – Eli Aug 10 '16 at 09:54
  • Then we delete messages from this post – Eli Aug 10 '16 at 10:02
  • another post: http://stackoverflow.com/questions/38870223/create-index-into-table – Eli Aug 10 '16 at 10:04
  • @Eli I have answered the "new" question, and edited this answer one to fit your original question more closely. Now both of the don't include the solution for a very large data file (using a `for` loop, and not categorical arrays), but I don't think it's important, the main issue was the indexing, and now it's clear. – EBH Aug 10 '16 at 14:12
  • no. I think your second solution is fit for large data. – Eli Aug 10 '16 at 15:37
  • You don't ask explicitly about large data, and this is the correct way to do it if the data is manageable, so I think I'll leave it this way. – EBH Aug 10 '16 at 15:42
  • hello my friend. I have a question. can you help me? – Eli Oct 01 '16 at 08:44
0

Say you have the following matrix

>> A = randi([0,1], [5,5])
A =

   1   0   1   1   1
   1   0   1   0   1
   1   1   1   1   0
   0   1   1   0   1
   0   0   0   1   0

you can find the vector for each row separately, by doing

>> find(A(1,:))
ans =

   1   3   4   5

If you want to collect these vectors, you need to decide how in what kind of structure you want to collect them.

Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57