To search a word in a large text, the Boyer Moore algorithm is extensively used.
Principle (see the link for a live example) : when starting the comparison at some place (index) in the file, if the first letter of the text being compared is not at all in the word being searched, there is no need to compare its other [wordLength - 1] characters with the text, and the index can move forward of the word length. If the letter is in the word, not here exactly, but shifted by a few chars, the comparison can also be shifted by a few chars etc...
- depending on the word length and similarity with the text, the search may be accelerated a lot (up to naiveSearchTime / wordLength).
edit Since you search from the end of the file, the 1st letter of the word (not the last) is to be compared at first. E.g. Searching "space" in "2001 a space odyssey", word space 's' is to be compared with the odyssey first 'y'. Next comparison is the same 's' against the text space 'c'.
And finally, to find the nth occurrence, a simple counter (initialized to n) is decremented each time the word is found, when it reaches 0, that's it.
The algorithm is easy to understand and to implement. Ideal for interviews.
You may ask also if the file is to be searched only once or several times? If it is intended to be searched multiple times, you can suggest to index the words from the file. I.e. create in memory a structure that allows to find quickly if a word is in it, where, how many times etc... I like the Trie algorithm also easy to understand, and very fast (can be pretty memory greedy also depending on the text). Its complexity is O(wordLength).
--
When the interviewer mentions "very large file" there are many factors to be considered, like
- search algorithm as above
- can the text fit in memory? (for instance when processing all of it) Do I have to implement a file-seek algorithm (i.e. use only part of the file in memory at a time)
- where is the file? Memory (fast), hard-disk (slower but at least local), remote (usually slower, connection issues, accesses to remote, firewalls, network speed etc..)
- is the file compressed? (will take even more space once uncompressed)
- is the file made of one file or several chunks?
- Does it contain text or binary? If text, its language gives an indication on the probability of a letter appearance (eg in English the Y appears much more frequently than in French).
- Offer to index the file words if relevant
- Offer to create a simpler file from big-one (like removing repeated words etc...) in order to have something smaller that can be processed more easily
...