32

I'm trying to figure out if there is a good way to manage units in my pandas data. For example, I have a DataFrame that looks like this:

   length (m)  width (m)  thickness (cm)
0         1.2        3.4             5.6
1         7.8        9.0             1.2
2         3.4        5.6             7.8

Currently, the measurement units are encoded in column names. Downsides include:

  1. column selection is awkward -- df['width (m)'] vs. df['width']
  2. things will likely break if the units of my source data change

If I wanted to strip the units out of the column names, is there somewhere else that the information could be stored?

smci
  • 32,567
  • 20
  • 113
  • 146
ajwood
  • 18,227
  • 15
  • 61
  • 104
  • 3
    I think the best way would be to store it in a Series / dictionary. If you want to somehow link these two, you can add an attribute (`df.units = pd.Series({'length' : 'm', 'width': 'm', 'thickness': 'cm'})`) -- This may be dangerous though. – ayhan Sep 09 '16 at 20:31
  • 2
    I didn't want to add a full answer since it's not Pandas, but the Astropy package can do this within its `Table` and `units` module, you can move over from DataFrame to Astropy Table (`atab=astropy.table.Table.from_pandas(df)`), and then give each column a unit (e.g. `atab['length'].unit = astropy.units.m`). I can post a mwe if you are interested, it looks too messy as a comment with lots of code. – Magnus Persson Jun 28 '18 at 17:03
  • Not an answer to your question, but you could use astropy tables to get the functionality of a dataframe-like that can handle units. – equant Feb 13 '19 at 16:27

3 Answers3

17

There isn't any great way to do this right now, see github issue here for some discussion.

As a quick hack, could do something like this, maintaining a separate dict with the units.

In [3]: units = {}

In [5]: newcols = []
   ...: for col in df:
   ...:     name, unit = col.split(' ')
   ...:     units[name] = unit
   ...:     newcols.append(name)

In [6]: df.columns = newcols

In [7]: df
Out[7]:
   length  width  thickness
0     1.2    3.4        5.6
1     7.8    9.0        1.2
2     3.4    5.6        7.8

In [8]: units['length']
Out[8]: '(m)'
chrisb
  • 49,833
  • 8
  • 70
  • 70
15

As I was searching for this, too. Here is what pint and the (experimental) pint_pandas is capable of today:

import pandas as pd
import pint
import pint_pandas

ureg = pint.UnitRegistry()
ureg.Unit.default_format = "~P"
pint_pandas.PintType.ureg.default_format = "~P"

df = pd.DataFrame({
    "length": pd.Series([1.2, 7.8, 3.4], dtype="pint[m]"),
    "width": pd.Series([3.4, 9.0, 5.6], dtype="pint[m]"),
    "thickness": pd.Series([5.6, 1.2, 7.8], dtype="pint[cm]"),
})

print(df.pint.dequantify())
     length width thickness
unit      m     m        cm
0       1.2   3.4       5.6
1       7.8   9.0       1.2
2       3.4   5.6       7.8
df['width'] = df['width'].pint.to("inch")

print(df.pint.dequantify())
     length       width thickness
unit      m          in        cm
0       1.2  133.858268       5.6
1       7.8  354.330709       1.2
2       3.4  220.472441       7.8
P. B.
  • 587
  • 6
  • 12
  • 3
    pint_pandas is indeed nice, but the package still comes with many issues, which makes it quite cumbersome to use. – Mathador Mar 17 '21 at 11:58
3

Offer you some methods:

  1. pands-units-extension: janpipek/pandas-units-extension: Units extension array for pandas based on astropy
  2. pint-pandas: hgrecco/pint-pandas: Pandas support for pint

you can also extend the pandas by yourself following this Extending pandas — pandas 1.3.0 documentation

Tri
  • 113
  • 7