0

I am trying to add meta information to each column of a pandas dataframe. For example I import measurement data like this:

columns = ['Relative_Pressure','Volume_STP']
df = pd.read_csv(StringIO(contents), skiprows=4, delim_whitespace=True,index_col=False,header=None)
df.columns = columns
df.drop(df.index[-1], inplace=True)

where contents is a string in csv format. This results in a pandas dataframe that looks e.g. like this:

Imported dataframe

Now I would like to add the respective units for every column of the dataframe and maybe also an additional description.

I saw this answer and tried to implement it like this:

df['Relative_Pressure'].unit = '-'
df['Relative_Pressure'].descr = 'p/p0'
df['Volume_STP'].unit = 'ccm/g'
df['Volume_STP'].descr = 'Additional info'

However this does not seem to change the Dataframe in any sense. When I print it again it looks exactly the same as before.

What would be a correct way to add metadata to the columns of the Dataframe or if I added the meta data correctly how can I display it?

EDIT: What is shown here would be very similar to what I would like to achieve, however I am not sure how I can do this with first importing the data and then adding the variable name rows.

Axel
  • 1,415
  • 1
  • 16
  • 40
  • df["Volume_STP"].unit = "ccm/g" just adds an attribute to df.Volume_STP. You can see this by calling df.Volume_STP.unit. If you could see the units by simply calling df.Volume_STP you would have a mess because this would cause you to see every attribute of this dataframe column. A better way of doing this is to use standard units for all of your imports (e.g. volume is ALWAYS ccm/g, pressure is ALWAYS hPa, Torr, or whatever). – tnknepp Mar 27 '19 at 12:51
  • Hi @tnknepp I don't think this would be mess. Couldn't the attributes be easily displayed in additional rows below the column names? – Axel Mar 27 '19 at 12:57
  • It would be a lot of information. If you have tab completion enabled on your IDE try typing df.Volume_STP. (notice the ending ".") and hit Tab you will see all possible completions. Alternatively, you could try help(df.Volume_STP) to see the same thing. – tnknepp Mar 27 '19 at 13:03
  • I think I will try to use a MultiIndex after loading the data into a DF. I will open a new question for that. – Axel Mar 27 '19 at 13:05
  • Ok, you can do that, but it adds an unnecessary level of complecation. My recommendation is to use standard units for all imported data, but it's your code...;) – tnknepp Mar 27 '19 at 13:07
  • The problem I am having with this is that we have a larger number of users who would need to work with the resulting DataFrames. For these users it would be very convenient to be sure which unit is exactly used for a certain variable. – Axel Mar 27 '19 at 13:11
  • That is where standardization comes in. If the user knows that the units are consistent then there is not worry. Make a readme file that goes with the data and everything should be fine. At some point the end-user needs to take some responsibility for correctly using the product, while the provider (you) need to take responsibility of informing the end user of what the data means. – tnknepp Mar 27 '19 at 13:14
  • I am not sure if my users are as properly working as yours ;-) Unfortunately I can't really get them to read them any readme or manual... – Axel Mar 27 '19 at 13:17
  • That's their problem, not yours. Adding another level of indexing will only make using the dataframe more cumbersome...which the user will complain about. – tnknepp Mar 27 '19 at 13:20

0 Answers0