I have a Pandas DataFrame whose columns are labeled with Python tuples.
These column labeling tuples can have None in them.
When I attempt to add columns to a data frame using either of the following approaches, the None in the labeling tuples are implicitly converted to a numpy.nan.
Approach 1 - Add columns with the dataframe[ NewColumn ] = ...
syntax
>>> import pandas
>>> df = pandas.DataFrame()
>>> column_label = ( 'foo', None )
>>> df[column_label] = [ 1, 2, 3 ]
>>> df
(foo, nan)
0 1
1 2
2 3
>>>
>>> df.columns
Index([(u'foo', nan)], dtype='object')
^^^
Desired to be be None
Approach 2 - Add column with the pandas.DataFrame.insert
>>> import pandas
>>> df = pandas.DataFrame()
>>> df.insert( 0, ( 'foo', None ), [ 1, 2, 3 ] )
>>> df
(foo, nan)
0 1
1 2
2 3
>>> df.columns
Index([(u'foo', nan)], dtype='object')
^^^
Desired to be None
So - what is going on here?
Is there a way to add columns to an existing data frame with a label that is a tuple containing None using either the DataFrame[]
or DataFrame.insert
syntax?
(Curiously, if you send None containing tuple column labels directly into the DataFrame constructor, or you explicitly set the columns attribute with None containing tuples, the None is retained, e.g.:
df = pandas.DataFrame( [ 1, 2, 3 ], columns=[ ( 'foo', None )] )
gives a DataFrame with ( 'foo', None )
as a column, not ( 'foo', nan )
.
Similarly doing: df.columns = [ ( 'foo', None ), ... ]
will set the first column label to ( 'foo', None )
).