I'm making an application which takes in a user's tweets using the Twitter API and one component of it is performing sentiment extraction from the tweet texts. For development I'm using xampp, of course using the Apache HTML Server as my workspace. I'm using Eclipse for PHP as an IDE.
For the sentiment extraction I'm using the uClassify Sentiment Classifier. The Classifier uses an API to receive a number of requests and with each request it sends back XML data from which the sentiment values can be parsed.
Now the application may process a large number of tweets (maximum allowed is 3200) at once. For example if there are 3200 tweets then the system will send 3200 API calls at once to this Classifier. Unfortunately for this number the system does not scale well and in fact xampp crashes after a short while of running the system with these calls. However, with a modest number of tweets (for example 500 tweets) the system works fine, so I am assuming it may be due to large number of API calls. It may help to note that the maximum number of API calls allowed by uClassify per day is 5000, but since the maximum is 3200 I am pretty sure that it is not exceeding this number.
This is pretty much my first time working on this kind of web development, so I am not sure if I'm making a rookie mistake here. I am not sure what I could be doing wrong and don't know where to start looking. Any advice/insight will help a lot!
EDIT: added source code in question
Update index method
function updateIndex($timeline, $connection, $user_handle, $json_index, $most_recent) {
// URL arrays for uClassify API calls
$urls = [ ];
$urls_id = [ ];
// halt if no more new tweets are found
$halt = false;
// set to 1 to skip first tweet after 1st batch
$j = 0;
// count number of new tweets indexed
$count = 0;
while ( (count ( $timeline ) != 1 || $j == 0) && $halt == false ) {
$no_of_tweets_in_batch = 0;
$n = $j;
while ( ($n < count ( $timeline )) && $halt == false ) {
$tweet_id = $timeline [$n]->id_str;
if ($tweet_id > $most_recent) {
$text = $timeline [$n]->text;
$tokens = parseTweet ( $text );
$coord = extractLocation ( $timeline, $n );
addSentimentURL ( $text, $tweet_id, $urls, $urls_id );
$keywords = makeEntry ( $tokens, $tweet_id, $coord, $text );
foreach ( $keywords as $type ) {
$json_index [] = $type;
}
$n ++;
$no_of_tweets_in_batch ++;
} else {
$halt = true;
}
}
if ($halt == false) {
$tweet_id = $timeline [$n - 1]->id_str;
$timeline = $connection->get ( 'statuses/user_timeline', array (
'screen_name' => $user_handle,
'count' => 200,
'max_id' => $tweet_id
) );
// skip 1st tweet after 1st batch
$j = 1;
}
$count += $no_of_tweets_in_batch;
}
$json_index = extractSentiments ( $urls, $urls_id, $json_index );
echo 'Number of tweets indexed: ' . ($count);
return $json_index;
}
extract sentiment method
function extractSentiments($urls, $urls_id, &$json_index) {
$responses = multiHandle ( $urls );
// add sentiments to all index entries
foreach ( $json_index as $i => $term ) {
$tweet_id = $term ['tweet_id'];
foreach ( $urls_id as $j => $id ) {
if ($tweet_id == $id) {
$sentiment = parseSentiment ( $responses [$j] );
$json_index [$i] ['sentiment'] = $sentiment;
}
}
}
return $json_index;
}
Method for handling multiple API calls
This is where the uClassify API calls are being processed at once:
function multiHandle($urls) {
// curl handles
$curls = array ();
// results returned in xml
$xml = array ();
// init multi handle
$mh = curl_multi_init ();
foreach ( $urls as $i => $d ) {
// init curl handle
$curls [$i] = curl_init ();
$url = (is_array ( $d ) && ! empty ( $d ['url'] )) ? $d ['url'] : $d;
// set url to curl handle
curl_setopt ( $curls [$i], CURLOPT_URL, $url );
// on success, return actual result rather than true
curl_setopt ( $curls [$i], CURLOPT_RETURNTRANSFER, 1 );
// add curl handle to multi handle
curl_multi_add_handle ( $mh, $curls [$i] );
}
// execute the handles
$active = null;
do {
curl_multi_exec ( $mh, $active );
} while ( $active > 0 );
// get xml and flush handles
foreach ( $curls as $i => $ch ) {
$xml [$i] = curl_multi_getcontent ( $ch );
curl_multi_remove_handle ( $mh, $ch );
}
// close multi handle
curl_multi_close ( $mh );
return $xml;
}