I'm using Jupyterlab and was trying to save a function as a module in my cwd, so I could call it from a separate module file.
So I was reading other similar posts and saved it as outlier.py and placed in my cwd.
this is the function:
import pandas
def remove_pps_outliers(df):
df_out = pandas.DataFrame() #taking new dataframe as output
for key, subdf in df.groupby('location'): # grouping by location
# for x in DF: (subdf = df.groupby(''))
m = np.mean(subdf.price_per_sqft) # per location getting subdataframe
st = np.std(subdf.price_per_sqft) # per location getting subdataframe (this means 1 standard deviation)
reduced_df = subdf[(subdf.price_per_sqft>(m-st)) & (subdf.price_per_sqft<=(m+st))]
# filtering all the datapoints that are >(dist) & anything below <=(m+st)
# (THINK OF A NORMAL DISTRIBUTION CURVE |___SD___ME|AN__SD__|)
df_out = pandas.concat(([df_out, reduced_df]), ignore_index=True)
# [ objs , axis ]
# [add these 2 together]
# I will keep on appending these two df PER LOCATION
return df_out
```
this is how I called it:
from outliers import *
df7 = remove_pps_outliers(df6)
df7.shape
and this is the error im getting:
~\outliers.py in remove_pps_outliers(df)
1 import pandas
----> 2
3 def remove_pps_outliers(df):
4 df_out = pandas.DataFrame() #taking new dataframe as output
5 for key, subdf in df.groupby('location'): # grouping by location
NameError: name 'pd' is not defined
Help?