0

I downloaded data from INTERMAGNET for my research and I've been trying to plot the data between 1 Jan 2013 to 31 Dec 2013. The plot generated by the INTERMAGNET site looks like this:

enter image description here

My plot on the other hand looks like this:

enter image description here

This is the code I used to plot my graph:

import pandas as pd
import matplotlib.pyplot as plt

# read csv file
df = pd.read_csv('data/thl_data.csv')

# convert day, month, year, hour, minute columns to datetime
df['date'] = pd.to_datetime(df[['year', 'month', 'day', 'hour', 'minute']])

# set date as index
df = df.set_index('date')

# filter data to include only between 1 Jan 2013 to 31 Dec 2013
start_date = '2013-01-01'
end_date = '2013-12-31'
df = df.loc[start_date:end_date]

# plot x, y, z values
plt.figure(figsize=(18,12))
plt.plot(df['x'], label='x')
plt.xlabel('Time')
plt.ylabel('X (nT)')
plt.title('THL (Qaanaaq (Thule), Greenland)')
plt.legend()
plt.show()

I even tried the resampling method and set it to day average but the plot still looks kind of similar to the unsampled version. No matter what I do, I can't get the plot to look like the plot from INTERMAGNET's site. How do I go about solving this issue?

The data looks like this:

day,month,year,hour,minute,x,y,z
1,1,2013,0,1,26104,-31575,562205
1,1,2013,0,2,26105,-31584,562201
1,1,2013,0,3,26109,-31593,562197
1,1,2013,0,4,26115,-31597,562197
1,1,2013,0,5,26113,-31611,562190
1,1,2013,0,6,26112,-31605,562195
1,1,2013,0,7,26106,-31604,562195
1,1,2013,0,8,26111,-31594,562202
1,1,2013,0,9,26111,-31596,562203
1,1,2013,0,10,26116,-31598,562202
1,1,2013,0,11,26113,-31596,562203
1,1,2013,0,12,26114,-31599,562202
1,1,2013,0,13,26110,-31604,562201
1,1,2013,0,14,26114,-31598,562206
1,1,2013,0,15,26124,-31587,562211
...
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Nikhil Raghavendra
  • 1,570
  • 5
  • 18
  • 25
  • The plot shows 3 columns of data `x,y,z` while the plot have 4 subplots. – rpanai May 22 '23 at 15:34
  • 1
    @rpanai There's only one plot though – Nikhil Raghavendra May 22 '23 at 15:35
  • 1
    It seems like the data you're plotting has some very large outliers that are hiding the rest of the signal. If you look at the `x` plot in the top image, you can see that the y-axis ranges from 2200 to 3100. However the scale in the plot you've generated ranges from 0 to 10,000,000. I would start by changing the y-limit on the plot you're creating to be 2200 to 3100 and if that seems to work, consider some outlier removal from your data. – Cameron Riddell May 22 '23 at 15:36
  • 1
    Looks like this is the problem. The data has to be denoised I guess. – Nikhil Raghavendra May 22 '23 at 15:43
  • Additionally, you should plot the dataframe directly (which uses matplotlib as the backend). `ax = df.plot(y='x', figsize=(18,12), xlabel='Time', ylabel='X (nT)', title='THL (Qaanaaq (Thule), Greenland)')` – Trenton McKinney May 22 '23 at 19:30
  • Instead of filtering the dataframe you can use `logy=True` or `logy='sym'` in `ax = df.plot(..., logy=True)`. – Trenton McKinney May 22 '23 at 19:32

0 Answers0