Heroku Twitter API App Deployment

Question

I've recently successfully deployed a node.js server through Heroku, and the machine can be found here: https://congress-tracker-app.herokuapp.com/

It's basically communicating with Twitter's API, updating a .csv file every time it receives a tweet from my predefined set of parameters – in this case, the IDs of the tweets. I am then displaying that CSV using D3.js to visualize the data.

The data is being pulled in using the D3 javascript library... d3.queue().defer(d3.csv, "public/data/twitterData.csv").await(update)

The update callback function is then passed the data, and creates my visualization.

I have two problems:

I cannot get the app to run in the background, AKA still update the CSV when the web page isn't open. I want the Twitter API to continue to communicate with my app, so the data will accumulate over time without someone having to keep the page open. It's being fetched with Twitter's stream API.
Reloading the app causes my CSV to clear back to a single row of data under the headers, which I entered in the initial build. Furthermore, when I clone the files to my desktop, the CSV in the public folder doesn't show any of the new data fetched from Twitter.

On the Heroku app page, my logs show that data is being added. The "file saved" message appears, which I have fire in my code when fs.appendFile adds a row to my CSV. Here's the message:

The CSV file, as you can see, sits in the public folder of my application. How can I ensure that after the app quits 1) the server continues to run and 2) the changes to my CSV are saved?

Here's part of my code:

var param = {follow: '21111098,958191744683782144,18061669,21111098,18061669,2891210047,1869975300,19394188,4107251,16056306,259459455,21111098,18061669,2891210047,1869975300,19394188,4107251,16056306,259459455,968650362,343041182,5558312,111671288,476256944,378631423,803694179079458816,30354991,224285242,45645232,235217558,20879626,150078976,278124059,102477372,249787913,381577682,15324851,435500714,823302838524739584,20597460,555355209,15745368,229966028,3001665106,2863210809,1397501864,78403308,253252536,47747074}
var followIds = ['21111098','958191744683782144','18061669','21111098','18061669','2891210047','1869975300','19394188','4107251','16056306','259459455','21111098','18061669','2891210047','1869975300','19394188','4107251','16056306','259459455','968650362','343041182','5558312','111671288','476256944','378631423','803694179079458816','30354991','224285242']

twitterClient.stream('statuses/filter',param,function(stream) {
 stream.on('data', function(tweet) {

 const fields = ["name","text","URL","time"]

 for(i = 0; i <followIds.length; i++){
   if(followIds[i] == tweet.user.id_str){ // if so, get contents

    // WRITE TO CSV HERE:
   let name = tweet.user.name;
   let text = tweet.text;
   let URL = `https://twitter.com/${tweet.user.screen_name}/status/${tweet.id_str}`
   let time = tweet.created_at;

    update = [{name,text,URL,time}]

    var toCsv = { 
        data: update, 
        fields: fields, 
        hasCSVColumnTitle: false
    };

    var csv = json2csv(toCsv) + "\r\n";

    fs.appendFile('public/data/twitterData.csv',csv,function(err){
        if (err) throw err;
        console.log('File Saved')
    })
    }
    }
});

score 4 · Accepted Answer · edited Jun 20 '20 at 09:12

4

I think the solution to your problem is kicking off a "Background Job" -- this gets your long-running application logic outside of the normal HTTP request/response cycle.

https://devcenter.heroku.com/articles/background-jobs-queueing
Heroku dynos have ephemeral filesystems -- meaning once you restart the app you will lose any temp files, including your CSV. You probably want to push this to a more permanent storage after your job completes.

https://help.heroku.com/DGUDV63H/how-much-disk-space-on-the-dyno-can-i-use

edited Jun 20 '20 at 09:12

Community

1
1

answered Feb 28 '18 at 20:30

Nuri Hodges

868
6
13

The second part of this answer is helpful, because I was able to find this tutorial (https://www.linux.com/learn/how-manage-amazon-s3-files-your-server-side-code) on updating files stored in an Amazon S3 server. However, I don't understand how to make a "background process" in Node. Do you know where I could find that information? I'm only capable of coding server-side stuff in Javascript, because normally I work pretty much exclusively in the front-end... – Harrison Cramer Feb 28 '18 at 21:10
1

Here's another SO answer that breaks down Background Jobs a little bit more. Basically it is just a Node app/script that you register with Heroku based on their convention. Once it is registered, you can trigger it however you want from your normal HTTP request/response handler. (One way that comes to mind is to just exec `heroku run your-background-job-name`) https://stackoverflow.com/questions/13345664/using-heroku-scheduler-with-node-js – Nuri Hodges Feb 28 '18 at 22:46

Heroku Twitter API App Deployment

1 Answers1