0

I have this assignment that I have to make a word cloud of the most common words used in an external text file which all words will start from the center and expand/animate to a word cloud... I already made a word counter to determine which words are most often used in the text file, now that I know which words are most often used, how can I select at least 15 random words from the top 50 most used words in the text file?

if that question can be easily answered how can I overlap all the selected words on the canvas?

note: I am very noob at java and processing so a code would help...

here is my code:

    String[] words;
    IntDict concordance;
    int index = 0;

    void setup() {
      size(500, 500);
      background(0);
      String[] lines = loadStrings("alice_just_text.txt");
      String entireplay = join(lines, " ");
      words = splitTokens(entireplay, ",.?!:-;:()03 ");
      concordance = new IntDict();
      frameRate(5);

      for (int i = 0; i < words.length; i++) {
        concordance.increment(words[i].toLowerCase());
      }
      concordance.sortValuesReverse();

      String[] keys = concordance.keyArray();
      for (int i = 0; i < keys.length; i++) {
        int count = concordance.get(keys[i]); //word counts
        println(keys[i], count);

      }

    }

    void draw() {

      background(0);
      textSize(64);
      textAlign(CENTER);
      text(words[index], width / 2, height / 2);
      index++;

    }
Moe
  • 462
  • 2
  • 16
Noob Coder
  • 27
  • 7
  • 2
    "_how can I select at least 15 random words from the top 50 most used words_" If you have the list of 50 words just shuffle the list then take the first 15 elements. – takendarkk Jul 06 '17 at 19:31
  • 1
    1) Shuffle the list and get the first 15, 2) *"so a code would help..."* a code from you would help too, I mean a [mcve], are you using `AWT`? – Frakcool Jul 06 '17 at 19:32
  • 1
    You might want to remove very common filler-words like "and", "the" etc. from the list before picking. Stick with nouns, verbs, adjectives and adverbs mostly. – rossum Jul 06 '17 at 19:55
  • 1
    @csm_dev I have a couple thousand of words in the external file – Noob Coder Jul 06 '17 at 20:10
  • 1
    @rossum how do i do that? – Noob Coder Jul 06 '17 at 20:10
  • @Frakcool I'm using processing... I just need a random code where it can like select 15 random words from the 50 most used words in an external file that consist of 3000+ words... I'm sorry because I'm very new to java – Noob Coder Jul 06 '17 at 20:14
  • Have a list of common words: "a", "and", "is", "the" etc. Either don't record those words as you read the input file or else delete them from the frequency file after you have read the input file and before you pick your 15 top words. – rossum Jul 06 '17 at 20:27

3 Answers3

0

This should do the trick

String[] top = Arrays.copyOfRange(keys, 0, 50);
List<String> list = Arrays.asList(top);
Collections.shuffle(list);
top = (String[]) list.toArray();

for (int i = 0; i < 15; i++) {
    System.out.println(top[i]); //do something
}
Moe
  • 462
  • 2
  • 16
  • You can put this block wherever you would like to use the list of words, as long as the words array is in the scope (i.e. you can access the array). By the way, instead of printing the words (System.out.println) you can use whatever method you want. – Moe Jul 06 '17 at 20:06
  • 2
    Mohamed? will that determine the top 15 most used words? – Noob Coder Jul 06 '17 at 20:08
  • 1
    No, it won't. It will print 15 random words. You need a `Map` to track the occurences of each word. – Rogue Jul 06 '17 at 20:22
  • I edited my answer, now it should. Put this wherever the **key** array is visible – Moe Jul 06 '17 at 20:24
  • @MohamedMoselhy this still just prints 15 random words – Rogue Jul 15 '17 at 10:49
0

Once you've read all the words from your file, I would use a Map<String, Integer> instance to store each word with its corresponding frequency.

After words = splitTokens (entireplay, ",.?!:-;:()03 "); put the following:

    Map<String, Integer> wordsMap = new HashMap<>();
    for (int i = 0; i < words.length; i++)
    {
        if (wordsMap.containsKey(words[i]))
        {
            wordsMap.put(words[i], wordsMap.get(words[i]));
        }
        else
        {
            wordsMap.put(words[i], 1);
        }
    }

Now all words are stored in the map with the corresponding frequency.

There's a great post on how to sort a map by value. Check here

Once you have it sorted, do the following to clear all but the top 50 from the list, shuffle the list, then print the first 15:

    for (int i = 50; i < listOfWords.size(); i++)
    {
        listOfWords.remove(i);
    }

    Collections.shuffle(listOfWords);
    for (int i = 0; i < 15; i++)
    {
        // Print the first 15 elements of the shuffled list.
    }
Trevor
  • 481
  • 10
  • 25
0

Using Java8 Streams you can do that in a really nice way IMO:

Path file = Paths.get("YOUR_FILE.txt");
Map<String, Integer> wordCounts = new HashMap<>();

// Create a Map of Word -> Count
try (Stream<String> lines = Files.lines(file)) {
    lines.map((line) -> line.split("\\s+")).forEach((words) -> {
        for (String word : words) {
            wordCounts.put(word, wordCounts.getOrDefault(word, 0) + 1);
        }
    });
}

// Process the Map
List<String> top50Words = wordCounts.entrySet().stream()
        .sorted((entry1, entry2) -> -Integer.compare(entry1.getValue(), entry2.getValue()))// Sort by Count (- for Reversed)
        .map(Map.Entry::getKey)// You don't need the Count anymore
        .limit(50L)// Limit to Top 50
        .collect(Collectors.toList());// Collect them as List

Collections.shuffle(top50Words);// Shuffle Top 50

List<String> _15RandomWordsOfTop15 = top50Words.stream()
        .limit(15L)// Limit to the first 15 in the shuffled List
        .collect(Collectors.toList());// Collect as List

System.out.println(_15RandomWordsOfTop15);
Felix
  • 2,256
  • 2
  • 15
  • 35