Consider the dataframe containing N columns as shown below. Each entry is an 8-bit integer.
|---------------------|------------------|---------------------|
| Column 1 | Column 2 | Column N |
|---------------------|------------------|---------------------|
| 4 | 8 | 13 |
|---------------------|------------------|---------------------|
| 0 | 32 | 16 |
|---------------------|------------------|---------------------|
I'd like to create a new column with 8-bit entries in each row by randomly sampling each bit of data from the remaining columns. So, the resulting dataframe would look like:
|---------------------|------------------|---------------------|---------------|
| Column 1 | Column 2 | Column N | Sampled |
|---------------------|------------------|---------------------|---------------|
| 4 = (100) | 8 = (1000) | 13 = (1101) | 5 = (0101) |
|---------------------|------------------|---------------------|---------------|
| 0 = (0) | 32 = (100000) | 16 = (10000) | 48 = (110000) |
|---------------------|------------------|---------------------|---------------|
The first entry in the "sampled" column was created by selecting one bit among all possible bits for the same position. For example, the LSB=1 in the first entry was chosen from {0 (LSB from col 1), 0 (LSB from col 2), 1 (LSB from col N)}
, and so on.
This is similar to this question but instead of each entry being sampled, each bit needs to be sampled.
What is an efficient way of achieving this, considering the dataframe has a large number of rows and columns? From the similar question, I assume we need a lookup
+ sample
to choose the entry and another sample
to choose the bits?