1

I'm trying to read data from a .csv file using Pandas, smoothing it with Savitsky-Golay filter, filtering it and then using Pandas again to write an output csv file. Data must be converted from DataFrame to an array to perform smoothing and then again to DataFrame to create the output file.

I found a topic on creation of dataframe from numpy arrays (Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers?) and i used the dataset = pd.DataFrame({'Column1': data[:, 0], 'Column2': data[:, 1]}) line to create mine.

The problem is that when I rename the column names to 'time' for first column and 'angle' for the second one, the order in the final dataframe changes. It seems as if the alphabetical order is important, which seems weird. Can someone help me with an explanation?

My complete code:

import scipy as sp
from scipy import signal
import numpy as np

import pandas as pd
import matplotlib.pyplot as plt

# Specify the input file
in_file = '0_chunk0_test.csv'

# Define min and max angle values
alpha_min = 35
alpha_max = 45

# Define Savitsky-Golay filter parameters
window_length = 15
polyorder = 1

# Read input .csv file, but only time and pitch values using usecols argument
data = pd.read_csv(in_file,usecols=[0,2])

# Replace ":" with "" in time values
data['time'] = data['time'].str.replace(':','')

# Convert pandas dataframe to a numpy array, use .astype to convert
# string to float
data_arr = data.to_numpy(dtype=np.dtype,copy=True)
data_arr = data_arr.astype(np.float)

# Perform a Savitsky-Golay filtering with signal.savgol_filter
data_arr_smooth = signal.savgol_filter(data_arr[:,1],window_length,polyorder)

# Convert smoothed data array to dataframe and rename Pitch: to angle
data_fr = pd.DataFrame({'time': data_arr[:,0],'angle': data_arr_smooth})

print data_fr
user2882635
  • 133
  • 2
  • 19
  • On Stackoverflow, it is good practice to reduce your code sample to a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example); you should delete the data processing and just leave in the initialization of `data_fr` (with some dummy data). – Han-Kwang Nienhuys Jun 11 '20 at 11:24
  • Thanks for the tip, I will keep that in mind for the future questions. – user2882635 Jun 16 '20 at 08:45

2 Answers2

2

Your question is essentially: why does this code result in a column order that is alphabetical, rather than the order that I provided?

data_fr = pd.DataFrame({'time': data_arr[:,0],'angle': data_arr_smooth})

Recent versions of pandas (0.23+ or 1.0+) actually do what you want, with columns ['time', 'angle'] rather than ['angle', 'time'].

Up to Python 3.5, dictionaries did not preserve the order of keys; by sorting alphabetically, pandas would at least give a reproducible column order. This was changed in Pandas 0.23 (released May 2018).

Han-Kwang Nienhuys
  • 3,084
  • 2
  • 12
  • 31
1

If your data is already in a dataframe, it's much easier to just pass the values of the Pitch column to savgol_filter:

data_arr_smooth = signal.savgol_filter(data.Pitch.values, window_length, polyorder)
data_fr = pd.DataFrame({'time': data.time.values,'angle': data_arr_smooth})

There's no need to explicitly convert your data to float as long as they are numeric, savgol_filter will do this for you:

If x is not a single or double precision floating point array, it will be converted to type numpy.float64 before filtering.

If you want both original and smoothed data in you original dataframe then just assign a new column to it:

data['angle'] = signal.savgol_filter(data.Pitch.values, window_length, polyorder)
Stef
  • 28,728
  • 2
  • 24
  • 52
  • This is great! I'm surprised by the user-friendliness of python every day. One problem; first part of your code returns an error; AttributeError: 'module' object has no attribute 'Dataframe' – user2882635 Jun 16 '20 at 08:45
  • 2
    You have a typo in the name: it must be `DataFrame` instead of `Dataframe`, i.e. it must be a capital F. – Stef Jun 16 '20 at 08:55