2

I have a dataset with temperature as one column. Due to how the heater works, there are a number of gaps in the data. In order to make different data sets directly comparable, I want to fill in these missing temperatures and add corresponding NaNs in another column.

I've tried to use the answer given here, which seems to be exactly what I want: link. But that doesn't work - I get a data frame with the new temperature values I want, but the corresponding data has gone:

import pandas as pd 
import numpy as np           
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2) 
A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19] 
A1truncated = A1[A1.temp >= 25]; A1truncated=A1truncated[A1truncated.temp <= 350.1]
A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean() 
A1averaged = np.around(A1averaged, decimals=1)

A1averaged.set_index('temp') 
new_index = pd.Index(np.arange(25, 350, 0.1), name='temp')
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index() 

Turns my 19 columns into 1 with temperature as the index (A1averaged), then into 2 columns with new temperature list and a column of empty data (A1indexed). Any ideas why this doesn't work? Or another method to do the same?

Community
  • 1
  • 1
Yobmod
  • 395
  • 3
  • 5
  • 18

1 Answers1

2

Index with float can reindex with problem, inconsistency is probably because of floating point accuracies. So I use little hack - Int64Index instead of Float64Index.

I try to set subset easier way:

A1truncated = A1[(A1.temp >= 25) & ( A1.temp <= 350.1)]

then omit first set index, because is set twice:

A1averaged.set_index('temp')

set new_index to Int64Index:

new_index = pd.Index(np.arange(250, 3500), name='temp')

and use Int64Index by multiply column temp by 10 and last this column is divided by 10.

A1averaged['temp'] = A1averaged['temp'] * 10
A1indexed['temp'] = A1indexed['temp'] / 10

All together:

import pandas as pd 
import numpy as np           
A1 = pd.read_table('Test data.tsv', encoding='ISO-8859-1', header = 2) 

A1.columns = ['time',2,3,4,5,6,7,'freq',9,10,11,12,13,'temp',15,16,17,18,19] 

A1truncated = A1[(A1.temp >= 25) & ( A1.temp <= 350.1)]

A1averaged = A1truncated.groupby(['temp'], as_index=False)['freq'].mean() 
A1averaged = np.around(A1averaged, decimals=1)
new_index = pd.Index(np.arange(250, 3500), name='temp')

A1averaged['temp'] = A1averaged['temp'] * 10
A1indexed = A1averaged.set_index('temp').reindex(new_index).reset_index()
A1indexed['temp'] = A1indexed['temp'] / 10
print A1indexed.tail()
#       temp       freq
#3245  349.5  5830065.6
#3246  349.6  5830043.5
#3247  349.7  5830046.3
#3248  349.8  5830025.3
#3249  349.9  5830015.6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252