Finding rows with maximum spread from a .dat file using Python

Question

So I have a sample .dat file which contains weather data for a single month as space-separated values. The first column of the file contains the day of the month; the second contains the maximum temperature for that day, while the third contains the minimum temperature.

I also have final row at the bottom which contains aggregate values for the entire month.

Ideally I want to write a program to find the row with the maximum spread in the .dat file, where spread would be the difference between maximum temperature and minimum temperature.

I would want my program to print the day of the month and spread to standard output.

Assuming that my program is called weather.py, then a sample run will look like:

$ python weather.py
2 16

And here is is my .dat file:

    Dy MxT   MnT   AvT   HDDay  AvDP 1HrP TPcpn WxType PDir AvSp Dir MxS SkyC MxR MnR AvSLP

   1  88    59    74          53.8       0.00 F       280  9.6 270  17  1.6  93 23 1004.5
   2  79    63    71          46.5       0.00         330  8.7 340  23  3.3  70 28 1004.5
   3  77    55    66          39.6       0.00         350  5.0 350   9  2.8  59 24 1016.8
   4  77    59    68          51.1       0.00         110  9.1 130  12  8.6  62 40 1021.1
   5  90    66    78          68.3       0.00 TFH     220  8.3 260  12  6.9  84 55 1014.4
   6  81    61    71          63.7       0.00 RFH     030  6.2 030  13  9.7  93 60 1012.7
   7  73    57    65          53.0       0.00 RF      050  9.5 050  17  5.3  90 48 1021.8
   8  75    54    65          50.0       0.00 FH      160  4.2 150  10  2.6  93 41 1026.3
   9  86    32*   59       6  61.5       0.00         240  7.6 220  12  6.0  78 46 1018.6
  10  84    64    74          57.5       0.00 F       210  6.6 050   9  3.4  84 40 1019.0
  11  91    59    75          66.3       0.00 H       250  7.1 230  12  2.5  93 45 1012.6
  12  88    73    81          68.7       0.00 RTH     250  8.1 270  21  7.9  94 51 1007.0
  13  70    59    65          55.0       0.00 H       150  3.0 150   8 10.0  83 59 1012.6
  14  61    59    60       5  55.9       0.00 RF      060  6.7 080   9 10.0  93 87 1008.6
  15  64    55    60       5  54.9       0.00 F       040  4.3 200   7  9.6  96 70 1006.1
  16  79    59    69          56.7       0.00 F       250  7.6 240  21  7.8  87 44 1007.0
  17  81    57    69          51.7       0.00 T       260  9.1 270  29* 5.2  90 34 1012.5
  18  82    52    67          52.6       0.00         230  4.0 190  12  5.0  93 34 1021.3
  19  81    61    71          58.9       0.00 H       250  5.2 230  12  5.3  87 44 1028.5
  20  84    57    71          58.9       0.00 FH      150  6.3 160  13  3.6  90 43 1032.5
  21  86    59    73          57.7       0.00 F       240  6.1 250  12  1.0  87 35 1030.7
  22  90    64    77          61.1       0.00 H       250  6.4 230   9  0.2  78 38 1026.4
  23  90    68    79          63.1       0.00 H       240  8.3 230  12  0.2  68 42 1021.3
  24  90    77    84          67.5       0.00 H       350  8.5 010  14  6.9  74 48 1018.2
  25  90    72    81          61.3       0.00         190  4.9 230   9  5.6  81 29 1019.6
  26  97*   64    81          70.4       0.00 H       050  5.1 200  12  4.0 107 45 1014.9
  27  91    72    82          69.7       0.00 RTH     250 12.1 230  17  7.1  90 47 1009.0
  28  84    68    76          65.6       0.00 RTFH    280  7.6 340  16  7.0 100 51 1011.0
  29  88    66    77          59.7       0.00         040  5.4 020   9  5.3  84 33 1020.6
  30  90    45    68          63.6       0.00 H       240  6.0 220  17  4.8 200 41 1022.7
mo 82.9 60.5 71.7 16 58.8 0.00 6.9 5.3

My problem is that Im trying to figure out how to get the maximum spread. I've so far read the file and printed out the values. What would be my next steps to get the maximum spread?

My code so far:

#!/usr/bin/env python


# read and print weather file
filename = "weather.dat"

with open(filename) as fn:
    content = fn.readlines()

print(content)

Any leads and assistance to this would be helpful.

Iterate over the file; split each line on white space; extract the day and temperature extremes and subtract; compare with the previously *saved* largest spread; if it is bigger - save this day and its spread; if it is not bigger continue. — wwii, Jan 16 '17 at 06:10

Mohammad Yusuf · Accepted Answer · 2017-01-16T08:56:53.767

1

You can try with pandas like so:

import pandas as pd

df = pd.read_csv('your_file.dat', sep='\s+')
df[['MxT', 'MnT']] = df[['MxT', 'MnT']].apply(lambda x: x.str[:2].astype(int))
a = df.MxT - df.MnT
b = a.index[a==max(a)].tolist()
df.loc[b]

Output:

If you just want the Day, MxT and MnT, you can get it like this:

df.loc[b][['Dy', 'MxT', 'MnT']].unstack().tolist()

Output:

[9, 86, 32]

edited Jan 16 '17 at 08:56

answered Jan 16 '17 at 06:10

Mohammad Yusuf

16,554
10
50
78

OK, could you like explain to me why you used the Panda import? Like also a breakdown of the solution. Thanks – connoisseur Jan 16 '17 at 06:14
@kimaiga Pandas is a Data Analysis tool. Read here more about it: http://pandas.pydata.org/pandas-docs/stable/10min.html Or watch this series: https://www.youtube.com/watch?v=eRpFC2CKvao&list=PLyBBc46Y6aAz54aOUgKXXyTcEmpMisAq3 – Mohammad Yusuf Jan 16 '17 at 06:20
@kimaiga Check updated solution. The max and min spreads are already marked as `*` in your `.dat` file. You can use that if you are allowed. – Mohammad Yusuf Jan 16 '17 at 06:33
Well it doesnt print my output on my console as I expected, gives me blank output – connoisseur Jan 16 '17 at 06:46
Use a `print` statement on the last line. I do not require it because I'm using IPython notebook. For eg. `print (df2.unstack().tolist())` – Mohammad Yusuf Jan 16 '17 at 06:48
Sure. Thanks it's now functional well. – connoisseur Jan 16 '17 at 06:57
@kimaiga If your problem is solved consider [accepting the answer](http://meta.stackexchange.com/questions/5234/how-does-accepting-an-answer-work) – Mohammad Yusuf Jan 16 '17 at 13:09

Finding rows with maximum spread from a .dat file using Python

1 Answers1