0

enter image description here

I have a dataframe of 51 rows and 464 columns , the columns contain 1's and 0's. I want to have a encoded value of the hex as you see in the attached picture.

I was trying to use numpy to make the hex conversion but it would fail

df = pd.DataFrame(np.random.randint(0,2,size=(51, 464)))
#converting into numpy for easier shifting
a = df.values
b = a.dot(2**np.arange(a.size)[::-1])

I want to have every 4 columns grouped to produce the hexadecimal value and then if there are odd columns for ex: 463 instead of 464 then the trailing hexadecimal will be padded with zero or zeroes based on how many ever needed to make the full hexadecimal value

This code only works for 64 bits length and then fails. I was following this example binary0|1 to hex string

any suggestions on how to do this?

2 Answers2

2

Doesn't this do what you want?

df.apply(lambda row: hex(int(''.join(map(str, row)), base=2)), axis=1)
  1. Convert to string every number in a row
  2. Join them to create one big number in string
  3. Convert it to integer with base 2 (since a row is in binary format)
  4. Convert it to hex

Edit: To convert every 4 piece with the same manner:

def hexize(row):
    hexes = '0x'
    
    row = ''.join(map(str, row))

    for i in range(0, len(row), 4):
        value = row[i:i+4]
        value = value.ljust(4, '0')  # right fill with 0
        value = hex(int(value, base=2))
        
        hexes += value[2:]
        
    return hexes

df.apply(hexize, axis=1)
hexize('011101100')  # returns '0x760'
emremrah
  • 1,733
  • 13
  • 19
1

Given input data:

ECID,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,T23,T24,T25,T26,T27,T28,T29,T30,T31,T32,T33,T34,T35,T36,T37,T38,T39,T40,T41,T42,T43,T44,T45,T46,T47,T48,T49,T50,T51
ABC123,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
XYZ345,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
DEF789,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
434thECID,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

This adds an "Encoded" column similar to what was asked. The first row example in the original question seems to have the wrong number of Fs:

import pandas as pd

def encode(row):
    s = ''.join(str(x) for x in row[1:])  # Create binary string
    s += '0' * (4 - len(row[1:]) % 4)     # Make length a multiple of 4 by adding zeros
    i = int(s,2)                          # convert to integer base 2
    h = hex(i).rstrip('0')                # strip trailing zeros
    return h if h != '0x' else '0x0'      # Handle special case of '0x0' stripping to '0x'
    
df = pd.read_csv('input.csv')
df['Encoded'] = df.apply(encode,axis=1)
print(df)

Output:

        ECID  T1  T2  T3  T4  T5  ...  T47  T48  T49  T50  T51          Encoded
0     ABC123   1   1   1   1   1  ...    1    1    1    1    1  0xffffffffffffe
1     XYZ345   1   0   0   0   0  ...    0    0    0    0    0              0x8
2     DEF789   1   0   1   0   1  ...    0    0    0    0    0             0xaa
3  434thECID   0   0   0   0   0  ...    0    0    0    0    0              0x0

[4 rows x 53 columns]
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251