1

I have been asked to count the number of tweets per hour by day (0 - 23) in a huge text file of random tweets. The date is not interesting, only the tweet per hour. I want to return them in a new array of objects. Each object should have properties hour and count like this:

{hour: x, count: y},

I've made a function where I'm declaring an empty array, in which I will put my data:

function(tweets) {
let result = [];

and I think I need to push them like this:

result.push({hour: x, count: y});

But I don't know how to extract the specific hour from my object (key and value).

in the huge, raw data file, each tweet is logged with a date like this:

created_at: "30-06-2015 14:27",

Any suggestions or experience? I'm currently learning about regex and for loops. Should I use them in this code or is there a smarter way?

Edit: as you asked for more details: The raw data are object in an array with the following structure:

{
time: Date-object,
created_at: "30-06-2015 14:27",
fromUsername: "victor",
text: "asyl og integration",
lang: "da",
source: "Twitter for Android", }

Hungry4k0k
  • 11
  • 2
  • could you provide at least a few rows of the text file. Also, what happens when you split by .split('/n') when reading the file? show us some output? Feel free to delete some of the data in each object but what is the overall structure? – Zargold Mar 15 '19 at 19:31
  • Feel free to delete some of the data in each object but what is the overall structure? Is it an array like `[ {...someTweetStuff, created_at: "30-06-2015 14:27" }, ]`?... also are you using node.js to read this file or are you have the static file hosted by express or some other server which you then read with a fetch request? – Zargold Mar 15 '19 at 19:33
  • Hi Zargold. Of course. I've made an edit at the bottom of my post. Hope that is not info :) If I try to do a .split('/n') and console.log() it afterwards, it makes a syntax error I'm using node.js, but through repl.it :) – Hungry4k0k Mar 15 '19 at 21:16
  • So (tweets) is an array of objects that look like that? – Zargold Mar 15 '19 at 21:25

3 Answers3

0

About extracting text I see good answer here. Instead of console.log add parsing and saving to your array.

About regexp - I think it should be something like

var re = /created_at: \"([^\"]*)\",/g;
A Ralkov
  • 1,046
  • 10
  • 19
0

What I would do is work from a different angle: create an object with a dateTimeHour for the start of each hour that you care about. It should presumably be a limited timespan like for all tweets that happened before now:

So generate something that looks like this dynamically:

{
'2019-03-01T17:22:30Z': 0, // or simply '1552667443928'
'2019-03-01T18:22:30Z': 0,
'2019-03-01T19:22:30Z': 0,
'2019-03-01T20:22:30Z': 0,
...etc
}

Which you can do using current Date and then a loop to create additional previous date times:

const now = new Date()
// you can use a generator here or simply a while loop:
const dateTimes = {}
while(now > REQUIRED_DATE)
   dateTimes[new Date(now.setHours(now.getHours() - 1))] = 0

Now you have an exhausted list of all the hours.

Then, check if the given tweet is within that hour: check if item.created_at < currentHourBeingLooked because you should loop through the Object.keys(dateTimes).

Then, loop through each item in your list and check if it fits that dateTime if so increment dateTimes[currentHour]++.

So, the hardest part will be converting created_at to a normal looking date time string:

const [datePortion, timePortion] = "30-06-2015 14:27".split(' ')
const [day, month, year] = datePortion.split('-')
const [hour, minute] = timePortion.split(':')

now with all those date, month, year, hour, and minute you can build a time object in javascript: It follows the formula: From MDN:

new Date(year, monthIndex [, day [, hours [, minutes [, seconds [, milliseconds]]]]]);

AKA:

new Date(year, monthIndex, day, hours, minutes, seconds);

So for December 17, 2019 @ 3:24am it'll be:

const = new Date(2019, 11, 17, 3, 24, 0);
Zargold
  • 1,892
  • 18
  • 24
0

I'll assume that you already know to use regex from the post pointed by Ralkov to get all of your created_at dates, and my answer will go from that.

You said the date is not important so once you have the string

'created_at: "30-06-2015 14:27"'

we need to get rid of everything except for the hour, i did it by extracting substrings, feel free to try other approaches, this is just to get you started.

var date = obj.substr(obj.indexOf(' ') + 1);
var time = date.substr(date.indexOf(' ') + 1);
var hour = time.substr(0, time.indexOf(':'));

will get yo the hour

"14"

Note that this only works for one day, you need to do some additional changes if you'd like to store tweet hour count for different days in the same data structure

When you write your for-loop use the following function each time you find a tweet and already extracted the hour, it stores a combination of value-pairs into a map variable defined outside the function, creating a new pair if necessary or just updates it with the new tweet count.

function newTweet(hour, tweetsPerHour) {
  var tweetsThisHour = tweetsPerHour.get(hour);
  tweetsThisHour = tweetsThisHour === undefined ? 0 : tweetsThisHour; 
  tweetsPerHour.set(hour, ++tweetsThisHour);
  console.log(tweetsThisHour)
}

complete code:

    var obj = 'created_at: "30-06-2015 14:27"';

    var date = obj.substr(obj.indexOf(' ')+1);
    var time = date.substr(date.indexOf(' ')+1);
    var hour = time.substr(0, time.indexOf(':'));

    var tweetsPerHour = new Map();

    newTweet(hour, tweetsPerHour); //this is the extracted hour
    newTweet("16", tweetsPerHour); //you can try different hours as well
    newTweet("17", tweetsPerHour);

    function newTweet(hour, tweetsPerHour) {
      var tweetsThisHour = tweetsPerHour.get(hour);

      tweetsThisHour = tweetsThisHour === undefined ? 0 : tweetsThisHour; 
      tweetsPerHour.set(hour, ++tweetsThisHour);
      console.log(hour + " tweet count: " + tweetsThisHour)
    }

what the code is doing is storing the hour and count of tweets in pairs:

[{"14":1} ,{"16":1}, {17:1}]

for example if you add "14" again it would update to

[{"14":2}, {"16":1}, {17:1}]

dig into JavaScript Map Objects as well.

Your code flow is something like the following:

  1. Read .txt file
  2. loop through dates -> get hour from date -> newTweet(hour, tweetsPerHour).
Alejandro Camba
  • 978
  • 10
  • 25