23

How to handle easily uncertainties on Series or DataFrame in Pandas (Python Data Analysis Library) ? I recently discovered the Python uncertainties package but I am wondering if there is any simpler way to manage uncertainties directly within Pandas. I didn't find anything about this in the documentation.

To be more precise, I don't want to store the uncertainties as a new column in my DataFrame because I think they are part of a data series and shouldn't be logically separated from it. For example, it doesn't make any sense deleting a column in a DataFrame but not its uncertainties, so I have to handle this case by hand.

I was looking for something like data_frame.uncertainties which could work like the data_frame.values attribute. A data_frame.units (for data units) would be great too but I think those things don't exist in Pandas (yet?)...

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
Falken
  • 353
  • 2
  • 8
  • 2
    The is probably too broad of a question for you to get a meaningful answer. The best you'll get is something like "store them in a separate column". – TomAugspurger Feb 10 '14 at 17:01
  • @TomAugspurger I modified my question... Having done that, I realize there is probably no perfect solution for the moment. I moved on [Computational Science](https://scicomp.stackexchange.com/questions/10770/pandas-limitations-and-its-alternatives-in-python). – Falken Feb 10 '14 at 17:25
  • This kind of sounds like a reasonable question. I don't have an answer to this particular question, but if you can do x in numpy, you can probably get pandas to do it. – Noah Feb 10 '14 at 18:57
  • 3
    You can use `uncertainties` with NumPy arrays (http://pythonhosted.org/uncertainties/numpy_guide.html). I do not use Pandas, but I would try to do the same in Pandas and in NumPy. I would be happy to update `uncertainties` so as to add Pandas compatibility, if needed, but I would first appreciate knowing if they're not yet compatible, and if they are not, where things block. – Eric O. Lebigot Feb 15 '14 at 16:11
  • 1
    @EOL Thank you for your involvement! I moved to another homemade solution for the moment (more adapted than Pandas to my needs) and it's not sure I will have the time in the next days to go back to this question but if I do so, I will surely tell you how `uncertainties` cohabits with Pandas! – Falken Feb 20 '14 at 17:58
  • 10
    @Falken: For your information: Pandas is quite compatible with uncertainties. You can do for instance `pandas.Series([uncertainties.ufloat(…,…),…])` or `pandas.Series(uncertainties.unumpy.uarray(…,…))`. This simply puts numbers with uncertainties in a Pandas column. Now, there may be cases where things breaks: in this case, please report the problem through https://github.com/lebigot/uncertainties. :) – Eric O. Lebigot Feb 22 '14 at 11:39

1 Answers1

1

If you really want it to be a built in function you can just create a class to put your dataframe in. Then you can define whatever values or functions that you want. Below I wrote a quick example but you could easily add a units definition or a more complicated uncertainty formula

import pandas as pd

data={'target_column':[100,105,110]}

class data_analysis():
    def __init__(self, data, percentage_uncertainty):
    self.df = pd.DataFrame(data)
    self.uncertainty = percentage_uncertainty*self.df['target_column'].values

When I run

example=data_analysis(data,.01)
example.uncertainty

I get out array([1. , 1.05, 1.1 ])

Hope this helps

Novice
  • 855
  • 8
  • 17