0

I'm trying to create a list of tuples within a dataframe. Using code below :

# creating the Numpy array
array = np.array([[('A' , 1)], [('B' , 2)]])
  
# creating a list of index names
index_values = ['x1', 'x2']
   
# creating a list of column names
column_values = ['(a,b)']
  
# creating the dataframe
df = pd.DataFrame(data = array, 
                  index = index_values, 
                  columns = column_values)
  
df

returns :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/tmp/ipykernel_45/2020978637.py in <module>
     13 df = pd.DataFrame(data = array, 
     14                   index = index_values,
---> 15                   columns = column_values)
     16 
     17 df

/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    676                     dtype=dtype,
    677                     copy=copy,
--> 678                     typ=manager,
    679                 )
    680 

/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in ndarray_to_mgr(values, index, columns, dtype, copy, typ)
    302         # by definition an array here
    303         # the dtypes will be coerced to a single dtype
--> 304         values = _prep_ndarray(values, copy=copy)
    305 
    306     if dtype is not None and not is_dtype_equal(values.dtype, dtype):

/opt/oss/conda3/lib/python3.7/site-packages/pandas/core/internals/construction.py in _prep_ndarray(values, copy)
    553         values = values.reshape((values.shape[0], 1))
    554     elif values.ndim != 2:
--> 555         raise ValueError(f"Must pass 2-d input. shape={values.shape}")
    556 
    557     return values

ValueError: Must pass 2-d input. shape=(2, 1, 2)

Using a single element tuple :

array = np.array([[(1)], [(2)]])

enter image description here

blue-sky
  • 51,962
  • 152
  • 427
  • 752

1 Answers1

1

The way you are creating the numpy array is wrong. Since it is an array of tuples, you will have to specify the dtype of the elements of the tuple while creating the array, and then later cast it back to an object type using astype(object).

Do the following -

array = np.array([[('A',1)], [('B',2)]], dtype=('<U10,int')).astype(object)

index_values = ['x1', 'x2']

column_values = ['(a,b)']

df = pd.DataFrame(data = array, index = index_values, columns = column_values)

Output:

>>> df
     (a,b)
x1  (A, 1)
x2  (B, 2)
Anirudh
  • 52
  • 1
  • 7