1

I would like to encode integer masks stored in pandas dataframe column into respective binary features which correspond to bit positions in those integers. For example, given 4-bit integers, and a decimal value of 11 I would like to derive 4 columns with values 1, 0, 1, 1, and so on across entire column.

Wojciech Migda
  • 738
  • 7
  • 16
  • 1
    Please provide a sample data. – YOLO Mar 20 '18 at 14:04
  • Please show some of your work plus share with us a [mcve](https://stackoverflow.com/help/mcve), [mcve2](http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports) – rpanai Mar 20 '18 at 14:12

1 Answers1

4

You can use:

df = pd.DataFrame([list('{0:04b}'.format(x)) for x in df['col']], index=df.index).astype(int)

Thank you, @pir for python 3.6+ solution:

df = pd.DataFrame([list(f'{i:04b}') for i in df['col'].values], df.index)

Numpy

Convert array to DataFrame - solution from this answer, also added slicing for swap values per rows:

d = df['col'].values
m = 4
df = pd.DataFrame((((d[:,None] & (1 << np.arange(m)))) > 0)[:, ::-1].astype(int))
#alternative
#df = pd.DataFrame((((d[:,None] & (1 << np.arange(m-1,-1,-1)))) > 0).astype(int))

Or:

df = pd.DataFrame(np.unpackbits(d[:,None].astype(np.uint8), axis=1)[:,-m:])
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252