0

I have collected an array of weather data that looks like this:

const data = [{
    "city_name": "London",
    "lat": 51.507351,
    "lon": -0.127758,
    "main": {
      "temp": 289.89,
      "temp_min": 288.468,
      "temp_max": 291.15,
      "feels_like": 287.15,
      "pressure": 1004,
      "humidity": 77
    },
    "wind": { "speed": 5.1, "deg": 230 },
    "clouds": { "all": 90 },
    "weather": [
      {
        "id": 804,
        "main": "Clouds",
        "description": "overcast clouds",
        "icon": "04n"
      }
    ],
    "dt": 1593561600,
    "dt_iso": "2020-07-01 00:00:00 +0000 UTC",
    "timezone": 3600
  },
  ...
];

This data continues in ascending date order (hour by hour), for the last 40 years.
(sample: https://pastebin.com/ciHJGhnq ) - the entire dataset is over 140MB.

From this data, I'd like to obtain the average temperature (object.main.temp) for each Month and Week of month, across the entire dataset.

The question I am trying to answer with my data is:

What is the average temperature for January, across the last 40 years.
What is the average temperature for February, across the last 40 years.
...

(get the temperature of each week in January and divide by the number of weeks, repeat for all of the other Januaries in the dataset and average that out too).

Repeat for remaining months.

The output I am aiming to create after parsing the data is:

{ 
   [ 
     "JANUARY": {
       "weekNumber": {
          "avgWeekTemp": 100.00
       }
       "avgMonthTemp": 69.00,
       ...
     },
     ...
   ]
}

The city name & structure of the objects are always the same, in this case London.

// build a unique number of months
// work through our data to work out the week numbers
// work through the data once again and place the data in the right week inside finalOutput
// work through the final output to determine the average values

Unfortunately I'm not very proficient in JavaScript, so I couldn't get past the second obstacle:

"use strict";

const moment = require("moment");
const data = require("./data.json");

let months = [
  {
    January: [],
  },
  {
    February: [],
  },
  {
    March: [],
  },
  {
    April: [],
  },
  {
    May: [],
  },
  {
    June: [],
  },
  { July: [] },
  { August: [] },
  { September: [] },
  { October: [] },
  { November: [] },
  { December: [] },
];
const finalOutput = [];

finalOutput.push(...months);

data.forEach((object) =>
  finalOutput.forEach((month) => {
    if (
      Object.keys(month)[0] === moment(new Date(object.dt_iso)).format("MMMM")
    ) {
      [month].push(object.dt_iso);
    }
  })
);

console.log(finalOutput);

Which only returned the array of months with nothing in each month.

[
  { January: [] },  
  { February: [] },
  { March: [] },
  { April: [] },
  { May: [] },
  { June: [] },
  { July: [] },
  { August: [] },
  { September: [] },
  { October: [] },
  { November: [] },
  { December: [] }
]

How can I calculate the average values per week & month across my entire data set?

user4157124
  • 2,809
  • 13
  • 27
  • 42
nopassport1
  • 1,821
  • 1
  • 25
  • 53
  • You have one `dataset` for one `city`? – Anton Aug 22 '20 at 23:54
  • Correct, just one dataset for 1 city. Only 1 city per dataset. Data set = 1 city, 40 years worth of data – nopassport1 Aug 23 '20 at 00:03
  • Your desired output is not valid JSON. I think I can figure out most of what you want, but if you care to update your post with a valid example I can tailor my script to your needs. Give an example that includes 2 months, using made-up values. – Tom Aug 23 '20 at 01:22
  • 1
    What does "week number" mean to you? Is it a number between 1 & 52 (i.e. numbered within the year)? Or does the week number start over at 1 every month? If it starts over each month, does it only start on Sunday (or Monday), or is "week 1" just the first 7 days within the month? – Tom Aug 23 '20 at 01:34

1 Answers1

1

I'm going to write your script for you, but while you wait here's some high-level guidance.

First, let's study your data. Each row is an hourly weather measurement. As a result, each datapoint you want will be an aggregate over a set of these rows. We should organize the script along those lines:

  1. We'll write a function that accepts a bunch of rows and returns the arithmetic mean of the temperatures of those rows: function getAvgTemp(rows) => Number

  2. We'll write another function that takes a bunch of rows, plus the desired month, and returns all the rows for just that month: function getRowsByMonth(month) => Array(rows)

  3. We'll write another function that takes a bunch of rows, plus the desired week number, and returns all the rows for just that week: function getRowsByWeekNumber(rows, weekNumber) => Array(rows)

^^ that's if "week number" means 1-52. But if "week number" means "week within the month," then instead we'll do:

  1. A function will also take a month: function getRowsByMonthWeek(rows, month, weekNumber) => Array(rows)

From these basic building blocks, we can write a routine that assembles the data you want.


What would that routine look like?

Probably something like this:

  1. Loop through all the months of the year. We won't look in the data for these months, we'll hard-code them.
    1. For each month, we'll call getRowsByMonth on the full data set. Call this set monthRows.
    2. We'll pass monthRows to getAvgTemp -- it doesn't care what the timespan is, it just extracts and crunches the temp data it receives. That's our avgMonthTemp solved for.
    3. Depending on what you mean by "week number," we'll divide the monthRows into smaller sets and then pass each set into getAvgTemp. (The hardest part of your script will be this division logic, but that's not to say it will be that hard.) This gives us your weekly averages.
    4. We'll assemble these values into a data structure and insert it into the final structure that ultimately gets returned/logged.

Here's the implementation. It's a little different than I expected.

The biggest change is that I did some pre-processing up front so that the date values don't have to be parsed multiple times. While doing that, I also calculate each row's weekNumber. As a consequence, the week logic took the form of grouping rows by their weekNumbers rather than querying the dataset by weekNumber.

Some notes:

  • I decided that "weekNumber" means "week-of-year."

Instead of using Moment, I found a week-number algorithm on StackOverflow. If you want to use Moment's algo instead, go ahead.

  • The output data structure is not what you described.

Your example is not valid JSON, so I made a guess as to what you had in mind.

Here's an example of what it looks like:

{
  "JUNE": {
    "avgMonthTemp": 289.9727083333334,
    "avgWeekTemps": {
      "25": 289.99106382978727,
      "26": 289.11
    }
  },
  "JULY": {
    "avgMonthTemp": 289.9727083333334,
    "avgWeekTemps": {
      "27": 289.99106382978727,
      "30": 289.11
    }
  }
}

The output will include a top-level entry for every month, whether or not there is any data for that month. However, the avgWeekTemps hash will only have entries for weeks that are present in the data. Both behaviors can be changed, of course.

  • It's a reusable script that processes arbitrary JSON files in the format you shared.

You mentioned that each file has data from one city, so I figured you'll be running this on multiple files. I set it up so you can pass the path to the data file as a command-line argument. Note that the CLI logic is not sophisticated, so if you're doing funky things you will have a bad time. Doing CLI stuff well is a whole separate topic.

If your data for London is in a file named london.json, this is how you would process that file and save the results to the file london-temps.json:

$ node meantemp.js london.json > london-temps.json
// meantemp.js

const FS = require('fs')


// sets the language used for month names
// for language choices, see: http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry
const MONTH_NAME_LANG_CODE = 'en-US'


// generate the list of month names once
const MONTH_NAMES = Array(12).fill().map(
    ( _, monthNum ) => new Date(2020, monthNum).toLocaleDateString(MONTH_NAME_LANG_CODE, { month: 'long' }).toUpperCase()
)


main()


function main() {
    let filepath = process.argv[2]

    let cityData = readJsonFile(filepath)

    // before working on the data, prep the date values for processing
    let allRows = cityData.map(row => {
        let _date = new Date(row.dt_iso)
        let _weekNum = getWeekNum(_date)
        return { ...row, _date, _weekNum }
    })

    let output = MONTH_NAMES.reduce(( hash, monthName, monthNum ) => {
        // grab this month's rows
        let monthRows = getRowsForMonth(allRows, monthNum)

        // calculate monthly average
        let avgMonthTemp = getMeanTemp(monthRows)

        // calculate weekly averages
        let rowsByWeekNum = groupRowsByWeekNum(monthRows)
        let avgWeekTemps = Object.keys(rowsByWeekNum)
        .reduce(( hash, weekNum ) => ({
            ...hash,
            [weekNum]: getMeanTemp(rowsByWeekNum[weekNum])
        }), {})

        return {
            ...hash,
            [monthName]: { avgMonthTemp, avgWeekTemps }
        }
    }, {})

    console.log(JSON.stringify(output))
}


function readJsonFile( path ) {
    try {
        let text = FS.readFileSync(path, 'utf8')
        return JSON.parse(text)

    } catch ( error ) {
        if(error.code === 'ENOENT') {
            console.error(`Could not find or read path ${JSON.stringify(path)}`)
            process.exit()

        } else if(error instanceof SyntaxError) {
            console.error(`File is not valid JSON`)
            process.exit()

        } else {
            throw error
        }
    }
}


function getRowsForMonth( rows, monthNum ) {
    return rows.filter(row => monthNum === row._date.getUTCMonth())
}


function groupRowsByWeekNum( rows ) {
    return rows.reduce(( hash, row ) => {
        if(!hash.hasOwnProperty(row._weekNum)) {
            hash[row._weekNum] = []
        }

        hash[row._weekNum].push(row)

        return hash
    }, {})
}


// ISO8601-compliant week-of-year function
// taken from https://stackoverflow.com/a/39502645/814463
// modified by me to prevent mutation of args
function getWeekNum( date ) {
    // if date is a valid date, create a copy of it to prevent mutation
    date = date instanceof Date
    ? new Date(date.getUTCFullYear(), date.getUTCMonth(), date.getUTCDate())
    : new Date()

    let nDay = (date.getDay() + 6) % 7
    date.setDate(date.getDate() - nDay + 3)
    let n1stThursday = date.valueOf()

    date.setMonth(0, 1)

    if (date.getDay() !== 4) {
        date.setMonth(0, 1 + ((4 - date.getDay()) + 7) % 7)
    }

    return 1 + Math.ceil((n1stThursday - date) / 604800000)
}


function getMeanTemp( hourlyReadings ) {
    let temps = hourlyReadings.map(reading => reading.main.temp)
    let mean = getMean(temps)
    return mean
}


function getMean( numbers ) {
    let sum = numbers.reduce(( sum, num ) => sum + num, 0)
    let mean = sum / numbers.length
    return mean
}
Tom
  • 8,509
  • 7
  • 49
  • 78
  • This is fantastic! Thank you so much. How would it be possible to obtain the 'week within the month' value? – nopassport1 Aug 23 '20 at 08:06
  • Here is the actual dataset I'd be working with: http://history.openweathermap.org/storage/d6c635c8c3fe01e4ab4ce0e494dbf47e.json Using the script above gives me a few unexpected values, such as January containing weeks 1-5 and then 52 and 53 as well. I assume that's because of weeks that contain both december & january (https://pastebin.com/ajSXVTvH ) I suspect that's because said week falls inbetween two months. What would be the best way to mitigate this? – nopassport1 Aug 23 '20 at 13:50
  • Well, as is so often the case in software design, the answer depends on your chosen definitions. If "week within the month" means days 1-7 are week 1, 8-14 are week 2, etc, then you'd replace the impl of `getWeekNum` to be a simple `Math.floor( date/7 )`. But if you want the week nums to only roll over on a specific weekday (e.g. Sunday), you'd need to write something smarter. – Tom Aug 23 '20 at 18:21
  • "What would be the best way to mitigate [week 53]?" Again, that depends on what you need your result to look like. I personally am not bothered by the existence of week 53. I'm willing to help, but my point is that there's not one right answer to this. Why is it a problem for you to have a week 53? Given that all years contain more than 52 weeks, how would you prefer to handle the extra 1 day (+1 in leap years)? – Tom Aug 23 '20 at 18:24
  • I just meant that in my result, for the month of January, it goes 1,2,3,4,5,52,53. I just wasn’t sure why (https://pastebin.com/ajSXVTvH ) – nopassport1 Aug 23 '20 at 18:26
  • That seems strange. I would expect one week at most from the previous year to bleed into the next year. I'll download the sample file you linked and look to see if it's my fault. But, one possibility is that that really is the result of the weekNumber function I used (which was advertised as adhering to the standard I believe is relevant, but maybe isn't). If it's that, I'll just select a different week-of-year algo (prolly from momentjs). – Tom Aug 23 '20 at 18:30
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/220322/discussion-between-tom-and-nopassport1). – Tom Aug 23 '20 at 18:43