4

I'm wondering why HDFStore gives warnings on string columns in pandas. I thought it may be NaNs in my real database, but trying it here gives me the warning for both columns even though one is not mixed and is simply strings.

Using .13.1 pandas and 3.1.1 tables

In [75]: d1 = {1:{'Mix': 'Hello', 'Good': 'Hello'}}

In [76]: d2 = {2:{'Good':'Goodbye'}}

In [77]: d2_df = pd.DataFrame.from_dict(d2,orient='index')

In [78]: d_df = pd.DataFrame.from_dict(d1,orient='index')

In [80]: d = pd.concat([d_df,d2_df])

In [81]: d
Out[81]:
      Good    Mix
1    Hello  Hello
2  Goodbye    NaN

[2 rows x 2 columns]

In [84]: d.to_hdf('test_.h5','d')
/home/cschwalbach/venv/lib/python2.7/site-packages/pandas-0.13.1-py2.7-linux-x86_64.egg/pandas/io/pytables.py:2446: PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block0_values] [items->['Good', 'Mix']]

  warnings.warn(ws, PerformanceWarning)
user1610719
  • 1,275
  • 2
  • 18
  • 35

2 Answers2

3

When storing using the fixed format (which if you don't specify format, defaults to fixed), you are storing object dtypes (strings are stored as object dtypes in pandas). These are variable length formats which are not supported by PyTables in the Array types (CArray, EArray), see the warning here

You can however store in a format='table'; see here for the docs on storing fixed-length strings.

Jeff
  • 125,376
  • 21
  • 220
  • 187
  • I have a VERY large dataframe. 20mm rows, and 40 columns. Is this the best way to try to store the DF given these parameters? – user1610719 Jun 05 '14 at 22:37
  • 2
    then for sure you want to use ``table`` format, as you can append and even store it in chunks; ``fixed`` is somewhat faster but you cannot append, nor query at all. read docs: http://pandas.pydata.org/pandas-docs/stable/io.html#hdf5-pytables (and cookbook is a link) – Jeff Jun 05 '14 at 22:39
0

The NaN value is the issue here. If you manage to replace with an empty string, the warning will go away.

TomTom101
  • 6,581
  • 2
  • 20
  • 31