1

I would like to add a new column between each of the date columns, with the interpolated value using the previous and next value to get a month value.

data = [['Jane', 10,11,45,66,21], ['John',11,55,34,44,22],['Tom',23,43,12,11,44]]
df = pd.DataFrame(data, columns = ['Name', '09-Aug', '02-Sep','18-Oct','02-Nov','14-Dec'])

This returns the following:

enter image description here

In between each column after the first one, I would like to add one which contains the month preceding it, and the interpolated value based on the preceding and next column.

So eg:

enter image description here

I tried to first add a column between each one using the following code:

N = len(df.columns) #
for i in range(0,N): #
    df.insert(i,'','',allow_duplicates=True)

But this only adds columns to the left of the table, not between each one. Once I had added the columns, I was thinking of writing a function to perform the linear interpolation.

Does anyone have a suggestion on the correct way around to tackle this?

Sunderam Dubey
  • 1
  • 11
  • 20
  • 40
work_python
  • 101
  • 6
  • How about melting your dataframe to a long format such that you have three colums: `name`, `date` and `value`. Then create a new column, say `month`, which is the rolling mean of two periods of `value`. Then melt this dataframe to long again, such that you have three columns `name`, `period` (date and month together in one column) and `value`. Then you can pivot the column `period` to the columns. – bert wassink Aug 02 '22 at 16:33
  • Thanks, how would I do this with multiple date columns? I have never used the pd.melt method before. A code example would be great. – work_python Aug 02 '22 at 16:40
  • You can use `rolling`, e.g., `df.rolling(2, axis=1).mean()` to get the mean values. You can then use ideas from [this](https://stackoverflow.com/questions/45565311/pandas-interleave-zip-two-dataframes-by-row) to get the columns in the proper order. – MYousefi Aug 02 '22 at 16:41
  • On how to transform from wide to long (melt) see https://pandas.pydata.org/docs/reference/api/pandas.melt.html or https://pandas.pydata.org/docs/reference/api/pandas.wide_to_long.html – bert wassink Aug 02 '22 at 16:43
  • @MYousefi oh I see! Take the task to a different dataframe, get the values, the interleave the two dataframes. How would I apply linear interpolation to the dataframe containing the values? I don't think rolling mean is the same? Or am I wrong. – work_python Aug 02 '22 at 16:50
  • pandas has a specific `interpolate` function, might as well use it if you can get your data formatted correctly. – BeRT2me Aug 02 '22 at 17:21

2 Answers2

1
from numpy import nan
import pandas


data = [['Jane', 10,11,45,66,21], ['John',11,55,34,44,22],['Tom',23,43,12,11,44]]
df = pd.DataFrame(data, columns = ['Name', '09-Aug', '02-Sep','18-Oct','02-Nov','14-Dec'])

df_c = df.drop('Name', axis=1)

for i in range(1, len(df.columns) + len(df.columns)-3, 2):
    col_title = df_c.iloc[:, i-1].name[3:]
    df_c.insert(i, col_title, pd.Series([nan] * len(df.index)))

df[['Name']].join(df_c.interpolate(axis=1))

Output

  • That's so helpful thank you! Is it possible for the value that is interpolated to be a custom function? eg the interpolated value takes the value from the left, divides it by two, and adds it to the value on the right, which is divided by two (just example of a custom function). – work_python Aug 02 '22 at 17:45
  • 1
    [`df.interpolate`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html) doesn't accept any function from what I can see. You can use [`df.apply`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) like I explained **[here](https://stackoverflow.com/a/73209396/13604396)**. The function would accept a row like in the example and iterate over all columns (by index). Using this index, you can change the next column value(s) how you like. – Confused Learner Aug 02 '22 at 18:00
  • 1
    There are a ton of `method` options for `df.interpolate`, it'd be worth looking through those to see if one fits your desired use case~ – BeRT2me Aug 02 '22 at 18:12
  • Thanks, I have added a new problem on a different page you may be able to help with https://stackoverflow.com/questions/73212716/how-would-i-find-the-quarterly-averages-of-these-monthly-figures-in-pandas?noredirect=1#comment129301004_73212716 – work_python Aug 02 '22 at 19:53
  • @work_python You can also up vote answers that are helpful. – Confused Learner Aug 02 '22 at 20:03
  • Thanks I have now done this. – work_python Aug 02 '22 at 20:03
0
# Make Name your index, and Transpose:
df = df.set_index('Name').T

# Convert index to datetime:
df.index = pd.to_datetime(df.index, format='%d-%b')

# Create new values for each month:
new_index_vals = pd.date_range(df.index.min(), df.index.max(), freq='MS')

# Reindex, including these new values:
df = df.reindex(df.index.union(new_index_vals))

# Apply interpolation, accounting for time:
df = df.interpolate('time') # You can also choose just `linear` here~

# Convert back to original format, formatting month_starts differently:
df.index = np.where(df.index.is_month_start, 
                    df.index.strftime('%B'), 
                    df.index.strftime('%d-%b'))

# Transpose back to original format:
df = df.T.reset_index()
print(df.round(2))

Output:

   Name  09-Aug  September  02-Sep  October  18-Oct  November  02-Nov  December  14-Dec
0  Jane    10.0      10.96    11.0    32.43    45.0     64.60    66.0     34.93    21.0
1  John    11.0      53.17    55.0    41.76    34.0     43.33    44.0     28.81    22.0
2   Tom    23.0      42.17    43.0    23.46    12.0     11.07    11.0     33.79    44.0
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • Thanks a lot! With this solution however, shouldn't the first value be the interpolated figure for the month of August? It begins at September here. Please correct me if I am wrong in my thinking.. – work_python Aug 02 '22 at 18:03
  • No, September 1st is between August 9th and September 2nd... If you want month ends, that can be done by changing `MS` to `M` and `is_month_start` to `is_month_end`. – BeRT2me Aug 02 '22 at 18:11