0

I was tired of studying to I decided to try to put my C knowledge to use and make a little program to grab me a random tweet that I've saved in a file and show it to me.

The text file is organized like:

@username
§
tweet1
§
tweet2
§
@username2

The idea was when I run the program it grabs a random user and then a random tweet.

The only ways I can think to randomize the user is:

  • Go through all text file, every time it see's a username it saves the line and increases a counter. Then I randomize the selector and get the username.
  • Avoid having to go through all text file. And just separate every user into a separate text file. Just get the names of files in a certain folder and randomize from there (if this is possible).

But then the same problem arises, how to randomize a tweet, I know when it begins and ends, but to pick a random one, the only way I can think of is the first one mentioned above.

Do you guys suggest any smarter way?

Thanks a ton!

DTek
  • 371
  • 1
  • 5
  • 13
  • One way could be to take a random decision to stop looking for users, and then another random decision to stop looking for the user's tweet. You might then just get the latest tweets. As you read, you could make the random decision increasingly tight. Then you won't use tweets from 3 months ago. – Weather Vane Jun 02 '18 at 19:51

1 Answers1

0

Here's a comment from some recent code I wrote that contains useful information for you:

/*
** From Wikipedia on Reservoir Sampling
** https://en.wikipedia.org/wiki/Reservoir_sampling
**
** Algorithm R
** The most common example was labelled Algorithm R by Jeffrey Vitter in
** his paper on the subject.  This simple O(n) algorithm as described in
** the Dictionary of Algorithms and Data Structures consists of the
** following steps (assuming k < n and using one-based array indexing):
**
**    // S has items to sample, R will contain the result
**    ReservoirSample(S[1..n], R[1..k])
**        // fill the reservoir array
**        for i = 1 to k
**            R[i] := S[i]
**
**        // replace elements with gradually decreasing probability
**        for i = k+1 to n
**            j := random(1, i)   // important: inclusive range
**            if j <= k
**                R[j] := S[i]
**
** Alternatively: https://stackoverflow.com/questions/232237
** What's the best way to return one random line in a text file
**
**      count = 0;
**      while (fgets(line, length, stream) != NULL)
**      {
**          count++;
**          // if ((rand() * count) / RAND_MAX == 0)
**          if ((rand() / (float)RAND_MAX) <= (1.0 / count))
**              strcpy(keptline, line);
**      }
**
** From Perl perlfaq5:
** Here's a reservoir-sampling algorithm from the Camel Book:
**
**      srand;
**      rand($.) < 1 && ($line = $_) while <>;
**
** This has a significant advantage in space over reading the whole file
** in.  You can find a proof of this method in The Art of Computer
** Programming, Volume 2, Section 3.4.2, by Donald E. Knuth.
*/

You'll need to make some decisions about what constitutes a random choice in your case.

If there are 12 tweeters in your file, with (for sake of discussion) between 1 and 12 tweets each, then do you want to choose each tweeter with a probability of 1/12, and then each tweeter, choose one of their tweets at random (from the set belonging to that tweeter), or do you have some other scheme in mind — such as that if there are 66 tweets, there's a 1/66 probability that a given tweet will be selected, but the tweeter who has tweeted most will be more likely to appear than the one who's only tweeted once.

Once you've decided which rules you want to follow, the coding based on the information above is fairly straight-forward.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278