Count frequency of a string individually from query

Question

I want to search for a query from a file named a.java. If my query is String name I want to get the frequency of a string individually from the query from the text file. First I have to count the frequency of String and then name individually and then add the frequency both. how can I implement this program in java platform?

public class Tf2 {
Integer k;
int totalword = 0;
int totalfile, containwordfile = 0;
Map<String, Integer> documentToCount = new HashMap<>();
File file = new File("H:/java");
File[] files = file.listFiles();
public void Count(String word) {
   File[] files = file.listFiles();
    Integer count = 0;
    for (File f : files) {
        BufferedReader br = null;
        try {
            br = new BufferedReader(new FileReader(f));
            count = documentToCount.get(word);

            documentToCount.clear();

            String line;
            while ((line = br.readLine()) != null) {
                String term[] = line.trim().replaceAll("[^a-zA-Z0-9 ]", " ").toLowerCase().split(" ");


                for (String terms : term) {
                    totalword++;
                    if (count == null) {
                        count = 0;
                    }
                    if (documentToCount.containsKey(word)) {

                        count = documentToCount.get(word);
                        documentToCount.put(terms, count + 1);
                    } else {
                        documentToCount.put(terms, 1);

                    }

                }

            }
          k = documentToCount.get(word);

            if (documentToCount.get(word) != null) {
                containwordfile++;
       
               System.out.println("" + k);

            }

        } catch (Exception e) {
            e.printStackTrace();
        }
    }
} public static void main(String[] args) throws IOException {Tf2  ob = new Tf2();String query="String name";ob.Count(query);
}}

I tried this with hashmap. but it cannot count the frequency of the query individually.

@aeberhart ok, .i will clarify it to you.here if I have a file that contains a line **Wikipedia is a free online encyclopedia, created and edited by volunteers around the world**.I want to search a query **edited Wikipedia volunteers**.then my program first count the frequency edited from the text file, then count Wikipedia frequency and then volunteers frequency, and at last it sum up all the frequency. can I solve it by using hashmap? — Sanzida Sultana, Aug 16 '20 at 16:20
How many queries do you expect for the same text? If there will be multiple queries then you could optimize accordingly. If there is one query, then the best option is to put the queried words into a set, and then going over the actual words one by one. Therefore the complexity will be O(n + k) where n is the number of words in the text. And k is the number of words in the query — Nuri Tasdemir, Aug 16 '20 at 16:32

SkillsIndexOutOfBounds · Answer 1 · 2020-08-16T21:22:27.900

Here is an example using Collections.frequency to get the count of string in file:

public void Count(String word) {
    File f = new File("/your/path/text.txt");
    BufferedReader br = null;
    List<String> list = new ArrayList<String>();
    try {
        if (f.exists() && f.isFile()) {
            br = new BufferedReader(new FileReader(f));
            String line;
            while ((line = br.readLine()) != null) {
                String[] arr = line.split(" ");
                for (String str : arr) {
                    list.add(str);
                }

            }
            System.out.println("Frequency = " + Collections.frequency(list, word));
        }

    } catch (IOException e) {
        e.printStackTrace();
    }
}

Here is another example using Java Streams API and also works for multifile search inside directory:

    public class Test {

    public static void main(String[] args) {
        File file = new File("C:/path/to/your/files/");
        String targetWord = "stringtofind";
        long numOccurances = 0;

        if(file.isFile() && file.getName().endsWith(".txt")){

            numOccurances = getLineStreamFromFile(file)
                    .flatMap(str -> Arrays.stream(str.split("\\s")))
                    .filter(str -> str.equals(targetWord))
                    .count();

        } else if(file.isDirectory()) {

            numOccurances = Arrays.stream(file.listFiles(pathname -> pathname.toString().endsWith(".txt")))
                    .flatMap(Test::getLineStreamFromFile)
                    .flatMap(str -> Arrays.stream(str.split("\\s")))
                    .filter(str -> str.equals(targetWord))
                    .count();
        }

        System.out.println(numOccurances);
    }

    public static Stream<String> getLineStreamFromFile(File file){
        try {
            return Files.lines(file.toPath());
        } catch (IOException e) {
            e.printStackTrace();
        }
        return Stream.empty();
    }
  }

Also, you can break the input string into individual word and loop to get the occurrence for each.

here, my question is if I have a file that contains a line "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world".I want to search a query "edited Wikipedia volunteers ".then my program first count the frequency edited from the text file, then count Wikipedia frequency and then volunteers frequency, and at last it sum up all the frequency. can I solve it by using hashmap? — Sanzida Sultana, Aug 16 '20 at 16:22
@SanzidaSultana You can find the frequency of edited, Wikipedia, volunteers separately using Collections.frequency and sum the frequency for all.....Is there any specific reason you want to achieve the same using Hashmap? — SkillsIndexOutOfBounds, Aug 16 '20 at 16:35
thanks for your feedback.if my query is a line which contains 5 words should I call collections.frequency for 5 times? **public void Count(String word)** here in parameter I have to send the query as a line like **public void count(edited Wikipedia volunteers free oline)** but count the words from the query separately.No specific reason for hasmap.just for practicing — Sanzida Sultana, Aug 16 '20 at 17:00

score 0 · Answer 2 · answered Aug 16 '20 at 15:35

You're over-complicating things greatly. If all you need to do is count occurrences, you don't need hashmaps or anything like that. All you need to do is to iterate over all of the text in the document and count how many times you find your search string.

Basically, your workflow would be:

Instantiate counter to 0
Read text
Iterate over text, looking for search string
When search string is found, increment counter
When finishes iterating over text, print result of counter

If you have a very long text, you could do this line-by-line or otherwise batch your reads.

Here is a simple example. Let's say I have a file and I'm looking for the word "dog".

// 1. instantiate counter to 0
int count = 0;

// 2. read text
Path path = ...; // path to my input file
String text = Files.readString(path, StandardCharsets.US_ASCII);

// 3-4. find instances of the string in the text
String searchString = "dog";

int lastIndex = 0;
while (lastIndex != -1) {
  lastIndex = text.indexOf(searchString, lastIndex); // will resolve -1 if the searchString is not found
  if (lastIndex != -1) {
    count++; // increment counter
    lastIndex += searchString.length(); // increment index by length of search term
  }
}

// 5. print result of counter
System.out.println("Found " + count + " instances of " + searchString);

In your specific example, you would read the contents of the a.java class, and then find the number of instances of 'String' followed by the number of instances of 'name'. You can sum them together at your leisure. So you'd repeat steps 3 and 4 for each word you're searching for, and then sum up all of your counts at the end.

The easiest way, of course, would be to wrap steps 3 and 4 in a method that returns the count.

int countOccurrences(String searchString, String text) {
  int count = 0;
  int lastIndex = 0;
  while (lastIndex != -1) {
    lastIndex = text.indexOf(searchString, lastIndex);
    if (lastIndex != -1) {
      count++;
      lastIndex += searchString.length();
    }
  }
  return count;
}

// Call:
int nameCount = countOccurrences("name", text);
int stringCount = countOccurrences("String", text);

System.out.println("Counted " + nameCount + " instances of 'name' and " + stringCount + " instances of 'String', for a total of " + (nameCount + stringCount));

(Whether you do a toLowerCase() on the text depends on whether you need case-sensitive matches or not.)

Of course, if you only want 'name' and not 'lastName', then you'll start needing to consider things like word boundaries (regex character class \b comes in useful here.) For parsing printed text, you'll need to consider words broken across line endings with a hyphen. But it sounds like your use case is simply counting instances of individual words that happen to have been provided to you in a space-delimited string.

If you actually just want instances of String name as a single phrase like that, just use the first workflow.

Other useful Q&A's:

score 0 · Answer 3 · answered Aug 16 '20 at 16:39

0

You could use a map with the words as the key and the count as the value:

  public static void main(String[] args) {
    String corpus =
        "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world";
    String query = "edited Wikipedia volunteers";

    Map<String, Integer> word2count = new HashMap<>();
    for (String word : corpus.split(" ")) {
      if (!word2count.containsKey(word))
        word2count.put(word, 0);
      word2count.put(word, word2count.get(word) + 1);
    }

    for (String q : query.split(" "))
      System.out.println(q + ": " + word2count.get(q));
  }

answered Aug 16 '20 at 16:39

aeberhart

744
1
4
15

thanks for your feedback.I have one more question if I have multiple files in a folder then how can I know of how many times is this query is occurring in which file and store it in a set for further use. can I use mapping?pls, don't mind if I ask any silly question – – Sanzida Sultana Aug 16 '20 at 18:22
You could use a Map> file2count where the first key is the filename. So file2count("f.txt").get("word")) would give you the count of "word" in file "f.txt". – aeberhart Aug 17 '20 at 08:00

Arvind Kumar Avinash · Accepted Answer · 2020-08-16T18:12:51.740

If I have a file that contains a line "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world".I want to search a query "edited Wikipedia volunteers ".then my program first count the frequency edited from the text file, then count Wikipedia frequency and then volunteers frequency, and at last it sum up all the frequency. can I solve it by using hashmap?

You can do it as follows:

import java.util.HashMap;
import java.util.Map;

public class Main {
    public static void main(String[] args) {
        // The given string
        String str = "Wikipedia is a free online encyclopedia, created and edited by volunteers around the world.";

        // The query string
        String query = "edited Wikipedia volunteers";

        // Split the given string and the query string on space
        String[] strArr = str.split("\\s+");
        String[] queryArr = query.split("\\s+");

        // Map to hold the frequency of each word of query in the string
        Map<String, Integer> map = new HashMap<>();

        for (String q : queryArr) {
            for (String s : strArr) {
                if (q.equals(s)) {
                    map.put(q, map.getOrDefault(q, 0) + 1);
                }
            }
        }

        // Display the map
        System.out.println(map);

        // Get the sum of all frequencies
        int sumFrequencies = map.values().stream().mapToInt(Integer::intValue).sum();

        System.out.println("Sum of frequencies: " + sumFrequencies);
    }
}

Output:

{edited=1, Wikipedia=1, volunteers=1}
Sum of frequencies: 3

Check the documentation of Map#getOrDefault to learn more about it.

Update

In the original answer, I've used the Java Stream API to get the sum of values. Given below is an alternative way of doing it:

// Get the sum of all frequencies
int sumFrequencies = 0;
for (int value : map.values()) {
    sumFrequencies += value;
}

Your other question is:

if I have multiple files in a folder then how can i know of how many times is this query os occurring in which file

You can create a Map<String, Map<String, Integer>> in which the key will be the name of the file and the value (i.e. Map<String, Integer>) will be the frequency map for the file. I've already shown above the algorithm to create this frequency map. All you will have to do is to loop through the list of files and populate this map (Map<String, Map<String, Integer>>).

if my query is = "edit Wikipedia volunteers" can I count the frequency of edited and edit both? I heard a topic about steaming. but it seems quite difficult for me at first.is there any other solution? pls don't mind my question as I am pretty basic here — Sanzida Sultana, Aug 16 '20 at 18:09
@SanzidaSultana - I just posted an update for your last comment. I hope the update answers your additional question in the comment. For further question, I suggest you post a new question. — Arvind Kumar Avinash, Aug 16 '20 at 18:14
**int sumFrequencies = map.values().stream().mapToInt(Integer::intValue).sum();** this line of code is for sum up the occurrences.but I can't understand those calling method. and another question is if I have multiple files in a folder then how can I know of how many times is this query is occurring in which file and store it in a set for further use. can I use mapping?pls don't mind if I ask any silly question — Sanzida Sultana, Aug 16 '20 at 18:18

Count frequency of a string individually from query

4 Answers4

Update