If my environment set up is as follows:
-64MB HDFS block
-5 tablet servers
-10 tablets of size 1GB each per tablet server
If I have a table like below:
rowA | f1 | q1 | v1
rowA | f1 | q2 | v2
rowB | f1 | q1 | v3
rowC | f1 | q1 | v4
rowC | f2 | q1 | v5
rowC | f3 | q3 | v6
From the little documentation, I know all data about rowA will go one tablet which may or may not contain data about other rows ie its all or none. So my questions are:
How are the tablets mapped to a Datanode or HDFS block? Obviously, One tablet is split into multiple HDFS blocks (8 in this case) so would they be stored on the same or different datanode(s) or does it not matter?
In the example above, would all data about RowC (or A or B) go onto the same HDFS block or different HDFS blocks?
When executing a map reduce job how many mappers would I get? (one per hdfs block? or per tablet? or per server?)
Thank you in advance for any and all suggestions.