1

I'm generating a few catalogues, and would like to have a column for comments. For some reason, when I generate the column and try to store a comment it only takes the first character.

from astropy.table import Column

C1 = Column(['']*12, name = 'ID')
C1[4] = 'test comment' 

Then

print C1[4]
>> t 

Looking at C1, I see that <Column name='ID' dtype='str1' length=12> so it's obviously only storing a 1 char string.

if I try

C2 = Column(['some really long silly string']*12, name = 'ID')
C2[4] = 'test comment' 

then

print C1[4]
>> test comment

but again, I can only store up to a 29 char string because <Column name='ID' dtype='str29' length=12> and this is a terrible solution anyway.

How do I tell Column to store any length string?

FriskyGrub
  • 979
  • 3
  • 14
  • 25
  • 1
    Related: [numpy recarray strings of variable length](http://stackoverflow.com/questions/9108837/numpy-recarray-strings-of-variable-length). You could use `Column(['']*12, name = 'ID', dtype=np.object)` for example. –  Sep 07 '16 at 02:08
  • It's interesting that this seems to work. If I set `dtype=np.str` it defaults to `str1` still. If you can't think of another workaround without generalising to `np.object` then you should submit this as an answer. – FriskyGrub Sep 07 '16 at 02:58
  • Tables like this (numpy's recarray, Pandas' dataframe) are made for some fixed type (~fixed memory allocation), which is why there is no generic (variable) string type. Eg, Pandas will infer an `object` when you initialize a column with strings. –  Sep 07 '16 at 04:54
  • Using `object`, you will potentially lose utilities like string comparison. To avoid that, you could simply use a long string for the column type. –  Sep 07 '16 at 04:57

1 Answers1

0

For this use case I usually first collect the data as a Python list of strings and then call the astropy.table.Column constructor.

>>> from astropy.table import Column
>>> data = ['short', 'something longer']
>>> Column(data=data, name='spam')
<Column name='spam' dtype='str3' length=2>
  a
bbb

The Column will convert your data to a Numpy array with fixed width dtype for strings of the appropriate length (and left pad shorter strings with spaces).

Similarly, when constructing astropy.table.Table objects, I usually first collect the data as a Python list of dicts of row data, and then let the Table constructor figure out the appropriate dtype automatically.

>>> from astropy.table import Table
>>> rows = [{'ham': 42, 'spam': 'a'}, {'ham': 99, 'spam': 'bbb'}]
>>> table = Table(rows=rows, names=['spam', 'ham'])
>>> table
<Table length=2>
spam  ham 
str3 int64
---- -----
   a    42
 bbb    99

Of course this isn't super fast or memory-efficient, but for my applications it's good enough.

More generally, note that working with strings stored in Numpy arrays (which is what astropy.table.Column is doing) simply is painful (in my opinion, no offense intended to Numpy developers or people that like it). The best support I'm aware of for this comes from pandas, so you could use pandas to work with your data and use the to_pandas and from_pandas method of astropy.table.Table if you need an Astropy table, e.g. to read / write to FITS files or do something else that pandas.DataFrame doesn't support.

Christoph
  • 2,790
  • 2
  • 18
  • 23