0

I have a large file of about 10 MB, I want to search a specific string, and this specific string may be used a lot of times in 10 Mb text file. I need results where this specific string is used. I want to do search like Google. For example when i write a string then google comes with matching Patterns . Your suggestions will be appreciated.

file formate


  1. he is going to school.
  2. we should do best deeds.
  3. we should work hard. . . . .
  4. Always speak truth.


    i have search edit field in my application. user write "should" in search edit field.and press search button. a list should be opened in which searched words come with it's complete line. for example result should be


  5. we should do best deeds.

  6. we should work hard.
  • You should provide more details.. like is the file always the same (or it changes)? What are the contents of the file (dictionary words, sentences, paragraphs)? What are the performance requirements (how fast you want.. within user response time or can be done as a batch task)? – Amulya Khare Dec 06 '13 at 02:09
  • @Amulya Khare...sorry for short story..file is same ,not changing, file is translation of Arabic Language book.one sentence is written in one line. I want the performance like google search engine. So this is all detail. – Mohammad Khan Dec 06 '13 at 17:14
  • @MohammadKhan though your question was closed for it being "unclear" what you're asking, that's not because your description is lacking -- especially your edit makes it better. The problem though is Stack Overflow is not a "here's my requirements, write me a program" site and your question seems to suggest (even if that's not what you intend) that this is what you want. Rather, Stack Overflow expects you to have a more specific problem, generally accompanied by code that you already have, isn't working, and you need specific help with. – mah Dec 06 '13 at 18:05
  • @mah...I am not asking for give project description about how to start. I need help about the best Search from large file. Actually my boss has given me a task about searching after spending 15 days he see my code and there were "for" loops .Boss rejected my project and said "do with proper search technique". That's my story.. – Mohammad Khan Dec 06 '13 at 18:45
  • @MohammadKhan a "for" loop is not wrong when you have a large file - you will want to process one line at a time (since you want to return complete lines). See my updated answer. – Floris Dec 06 '13 at 19:43
  • If the file is large and you want "instant" search capability it may be worth making an index of all words - this will make it lightning fast for simple words searches although it won't help with pattern (wildcard) searches. Do update question with your code if you can. – Floris Dec 07 '13 at 13:11

1 Answers1

1

A simple way to search a file and get a match "with context" is to use grep. For example, to match every line with "hello", and print one line before and three lines after, you would do

grep -b1 -a3 'hello' myBigFile.txt

You can use grep -E to allow for a wide range of PCRE regex syntax.

Without more detail it would be hard to give you a better answer.

EDIT 2

Now that you have explained your problem more clearly, here is a possible approach:

InputStream fileIn;
BufferedReader bufRd;
String line, pattern;
pattern = "should";  // get the pattern from the user, do not hard code. Example only

fileIn = new FileInputStream("myBigTextfile.txt");
bufRd = new BufferedReader(new InputStreamReader(fis, Charset.forName("UTF-8")));
while ((line = bufRd.readLine()) != null) {
    if(line.contains(pattern)) {
      System.out.println(line); // echo matching line to output
    }
}

// Done with the file
br.close();

If you need to match with wildcards, then you might replace the line.contains with something that is a little more "hard core regex" - for example

matchPattern = Pattern.compile("/should.+not/");

(only need to do that once - after getting input, and before opening file) and change the condition to

if (matchPattern.matcher(line).find())

Note - code adapted from / inspired by https://stackoverflow.com/a/7413900/1967396 but not tested.

Note there are no for loops... maybe the boss will be happy now.

By the way - if you edit your original question with all the information you provided in the comments (both to this answer and to the original question) I think the question can be re-opened.

If you expect the user to do many searches it may be faster to read the entire file into memory once. But that's outside of the scope of your question, I think.

Community
  • 1
  • 1
Floris
  • 45,857
  • 6
  • 70
  • 122
  • thanks for suggestion. i have edit my question above. basically i have a text file that is written in English Language its translation of An Arabic Book.There are 6666 lines in the text file. my application is searched based. Now i want best search for this heavy data. In future data may be exceed by adding more books. I want suggestion for best Search like google. – Mohammad Khan Dec 06 '13 at 18:25
  • Your edited question is much clearer. You might want to give a bit of detail on the language / environment in which you want to do this. Are you writing an application? If so what language are you using? "Regex" is a tool, but it is used _in an environment_. Without information about the environment ("Android" is not enough) it's hard to be more specific. `grep 'should' myArabicFile.txt` will give **exactly** what you want if your environment has `grep`. You could even get the line numbers printed with `grep -n 'should' myArabicFile.txt` – Floris Dec 06 '13 at 18:32
  • Environment is Android. I have Arabic book with English, Urdu,Russian Languages Translations. My application has search bar. For example user write a word "simple". then this string(simple), my program will continue to find through out the large text file.My large text file has 6666 number of lines. my program should show a list of lines where my word is found. – Mohammad Khan Dec 06 '13 at 18:55
  • Language Java I am using – Mohammad Khan Dec 06 '13 at 19:04
  • ..that is what i want. Exactly not for loop. I want to implement Pattern method.I will come back with implementation.. – Mohammad Khan Dec 07 '13 at 02:30
  • 1
    If you are using Android, maybe you can use a SQL-Lite virtual table. RDMBS need to implement eficent text search algoritms and you can use it instead implement your own approach. Check this out: http://developer.android.com/training/search/search.html No, i don't test it yet. So i can't say that is fast enough to your needs, but don't hurt give a try... – Renascienza Jul 06 '15 at 05:23