2

I am just starting to use user-defined functions, so this is probably not a very complex question, forgive me.

I have a few dataframes, which all have a column named 'interval_time' (for example) and I would like to rename this column 'Timestamp'.

I know that I can do this manually with this;

df = df.rename(index=str, columns={'interval_time': 'Timestamp'})

but now I would like to define a function called rename that does this for me. I have seen that this works;

def rename(data):
    print(data.rename(index=str, columns={'interval_time': 'Timestamp'}))

but I can't seem to figure out to save the renamed dataframe, I have tried this;

def rename(data):
    data = data.rename(index=str, columns={'interval_time': 'Timestamp'})

The dataframes that I am using have the following form;

df_scada
              interval_time                 A         ...             X                 Y 
0       2010-11-01 00:00:00                0.0        ...                396.36710         381.68860
1       2010-11-01 00:05:00                0.0        ...                392.97974         381.40634
2       2010-11-01 00:10:00                0.0        ...                390.15695         379.99493
3       2010-11-01 00:15:00                0.0        ...                389.02786         379.14810
jpp
  • 159,742
  • 34
  • 281
  • 339
Luka Vlaskalic
  • 445
  • 1
  • 3
  • 19

3 Answers3

3

There are a few points to note:

  • You need to use return in your function.
  • It's good practice to make such functions generic. For example, your input and output column names can be arguments with default values set.
  • Pandas offers pd.DataFrame.pipe to facilitate method chaining.
  • You should not name your function the same as the Pandas method. This will only lead to confusion.

Putting these elements together:

def rename_col(data, col_in='interval_time', col_out='Timestamp'):
    return data.rename(index=str, columns={col_in: col_out})

df = df.pipe(rename_col)

This is a trivial example, which probably doesn't require a user-defined function. However, the above advice may help when you write more complex functions.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • I agree that this is quite trivial, I could have done it more simply another way, I am just starting to understand how to use user-defined functions, so thought this was a good thing to try – Luka Vlaskalic Jul 06 '18 at 10:34
  • 1
    @LukaVlaskalic, No problem, I thought so, which is why I thought I'd give some extra pointers :) – jpp Jul 06 '18 at 10:34
  • I just updated the question with a further complexity – Luka Vlaskalic Jul 06 '18 at 12:21
  • 1
    I've rolled back. Please ask as a [new question](https://stackoverflow.com/questions/ask). Since there are already 3 answers, it's not practical for everyone to update their answers with the new requirement. – jpp Jul 06 '18 at 12:22
  • So if you really want to improve on pandas, check the brilliant [Modern Pandas](https://tomaugspurger.github.io/modern-1-intro.html) blog series. – Quickbeam2k1 Jul 06 '18 at 14:10
  • I didn't realise that everyone would need to change their answers, I just wanted a little bit of further help, and unfortunately, I can only post a question every 90 mins. But no worries, I now managed to post the question, thank you – Luka Vlaskalic Jul 06 '18 at 14:51
  • @LukaVlaskalic, Yep, unfortunately that's how SO work. Many people view all the answers (each may have a different valid solution), so having incomplete ones spoils the party. – jpp Jul 06 '18 at 14:52
  • I gathered as much, for sure makes sense – Luka Vlaskalic Jul 06 '18 at 14:55
2

Without inplace=True, the function creates a new object, which needs to be returned:

import pandas as pd

def rename(data):
    return data.rename(index=str, columns={'interval_time': 'Timestamp'})

data = pd.DataFrame([1,2,3,4], columns=['interval_time'])
renamed_data = rename(data)

If no new DF should be created, set inplace=True in the function.

jnd940
  • 21
  • 3
0

You do not need to re-assign the dataframe after you call the rename function because pandas.DataFrame is a mutable object and therefore it is passed by reference. Have a look on this link on how python objects are passed

https://jeffknupp.com/blog/2012/11/13/is-python-callbyvalue-or-callbyreference-neither/

Also, you should use the inplace property so that you do not create a new object inside the function. Your code in the rename function will then look like

def rename(data):
    data.rename(index=str, columns={'interval_time': 'Timestamp'}, inplace=True)

After you call rename(df) your DataFrame df has its columns renamed.

kosnik
  • 2,342
  • 10
  • 23
  • actually, using inplace is very often [discouraged] (https://stackoverflow.com/questions/45570984/pandas-is-inplace-true-considered-harmful-or-not). A better solutions btw would just be to not create a new function and just use `data = data.rename(Index=str, columns={'interval_time': 'Timestamp'})`. Anyway this approach and your function are not suitable in pipelines – Quickbeam2k1 Jul 06 '18 at 12:31