how to convert a dataframe containing 1's and 0's and add a new column to the same dataframe that represents the hex value of entire row in python

Question

I have a dataframe of 51 rows and 464 columns , the columns contain 1's and 0's. I want to have a encoded value of the hex as you see in the attached picture.

I was trying to use numpy to make the hex conversion but it would fail

df = pd.DataFrame(np.random.randint(0,2,size=(51, 464)))
#converting into numpy for easier shifting
a = df.values
b = a.dot(2**np.arange(a.size)[::-1])

I want to have every 4 columns grouped to produce the hexadecimal value and then if there are odd columns for ex: 463 instead of 464 then the trailing hexadecimal will be padded with zero or zeroes based on how many ever needed to make the full hexadecimal value

This code only works for 64 bits length and then fails. I was following this example binary0|1 to hex string

any suggestions on how to do this?

How about appending to a string and then converting that to a hex? — Pranav Hosangadi, Jul 22 '21 at 20:15
Please [edit] your code into your question. This allows the people volunteering their time to quickly repurpose your code for their suggestions instead of starting from scratch. If you correctly converted each row to a list, it should be fairly easy to [`join()`](https://docs.python.org/3/library/stdtypes.html#str.join) them all into a single string and go from there — Pranav Hosangadi, Jul 22 '21 at 20:34
I have edited the question to the current method i was trying, please let me know — Nandeep Devendra, Jul 22 '21 at 20:45
Shouldn't the first row be 0xFFFFFFFFFFFFE (12 Fs and an E, 4x12=48+3=51)? — Mark Tolonen, Jul 22 '21 at 21:26

emremrah · Accepted Answer · 2021-07-22T21:30:50.807

2

Doesn't this do what you want?

df.apply(lambda row: hex(int(''.join(map(str, row)), base=2)), axis=1)

Convert to string every number in a row
Join them to create one big number in string
Convert it to integer with base 2 (since a row is in binary format)
Convert it to hex

Edit: To convert every 4 piece with the same manner:

def hexize(row):
    hexes = '0x'
    
    row = ''.join(map(str, row))

    for i in range(0, len(row), 4):
        value = row[i:i+4]
        value = value.ljust(4, '0')  # right fill with 0
        value = hex(int(value, base=2))
        
        hexes += value[2:]
        
    return hexes

df.apply(hexize, axis=1)
hexize('011101100')  # returns '0x760'

edited Jul 22 '21 at 21:30

answered Jul 22 '21 at 20:55

emremrah

1,733
13
19

Sorry the numbers are actually in binary, not decimal. I'm going to update my answer or remove it if I fail. – emremrah Jul 22 '21 at 20:58
I have added `base=2` to int conversion – emremrah Jul 22 '21 at 21:01
that works, but when i have odd columns the MSB is getting ignored, For ex: if i have 9 columns with value 0,1,1,1,0,1,1,0,0. I would like it to be 0x760 instead i end up with 0xec – Nandeep Devendra Jul 22 '21 at 21:05
Is MSB the column 1 or 464? – emremrah Jul 22 '21 at 21:09
not particularly, i want every 4 columns to be become hexadecimal value and if odd amount of columns then the last column will be padded with enough zeroes to make it a hexadecimal value . i guess in a way you can say column1 is MSB – Nandeep Devendra Jul 22 '21 at 21:13
Ahh, I think that should've been cleared in the question :) – emremrah Jul 22 '21 at 21:14
yup, sorry updated the question to have that detail – Nandeep Devendra Jul 22 '21 at 21:17
1

@NandeepDevendra If column 1 is MSB, shouldn't the 2nd row be 0x8 not 0x1? The conversion isn't clear from the examples. – Mark Tolonen Jul 22 '21 at 21:28
I have updated my answer. @MarkTolonen you're actually right. – emremrah Jul 22 '21 at 21:34
@MarkTolonen you are right it should be 0x8 – Nandeep Devendra Jul 22 '21 at 21:37

Mark Tolonen · Answer 2 · 2021-07-22T22:13:09.033

Given input data:

ECID,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,T23,T24,T25,T26,T27,T28,T29,T30,T31,T32,T33,T34,T35,T36,T37,T38,T39,T40,T41,T42,T43,T44,T45,T46,T47,T48,T49,T50,T51
ABC123,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1
XYZ345,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
DEF789,1,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
434thECID,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

This adds an "Encoded" column similar to what was asked. The first row example in the original question seems to have the wrong number of Fs:

import pandas as pd

def encode(row):
    s = ''.join(str(x) for x in row[1:])  # Create binary string
    s += '0' * (4 - len(row[1:]) % 4)     # Make length a multiple of 4 by adding zeros
    i = int(s,2)                          # convert to integer base 2
    h = hex(i).rstrip('0')                # strip trailing zeros
    return h if h != '0x' else '0x0'      # Handle special case of '0x0' stripping to '0x'
    
df = pd.read_csv('input.csv')
df['Encoded'] = df.apply(encode,axis=1)
print(df)

Output:

        ECID  T1  T2  T3  T4  T5  ...  T47  T48  T49  T50  T51          Encoded
0     ABC123   1   1   1   1   1  ...    1    1    1    1    1  0xffffffffffffe
1     XYZ345   1   0   0   0   0  ...    0    0    0    0    0              0x8
2     DEF789   1   0   1   0   1  ...    0    0    0    0    0             0xaa
3  434thECID   0   0   0   0   0  ...    0    0    0    0    0              0x0

[4 rows x 53 columns]

how to convert a dataframe containing 1's and 0's and add a new column to the same dataframe that represents the hex value of entire row in python

2 Answers2