1

I am working through a lab on RegEx which asks me to:

Search the 'countries' file for all the words with nine characters and the letter i.
How many results are found?

I am working in a generic Linux command prompt in a online emulated environment. I am allowed to use grep, awk or sed though I am feeling a preference for grep.

(I am 100% a noob when it comes to RegEx so please explain it to me like I'm 5)

Per a previous lab I already used something like below which finds me all countries which have 9 characters, however I cannot find a way to make it find all words which have 9 characters AND contain the letter i in any position.

grep -E '\b\w{9}\b' countries

The | operator does not help because its an OR operator and will find me all instances that i is found, and all words which are 9 characters and I need both to happen at the same time. I tried multiple grep statements as well and it seems the emulator may not accept that.

I am also trying to stick to [] character sets as the next question asks for multiple letters within the 9 letter word.

Lee Mac
  • 15,615
  • 6
  • 32
  • 80
  • Hello Jen, welcome to Stack Overflow. See if one of these answers your question: [regex AND operator](https://stackoverflow.com/a/470602/7586), [regex for password strength](https://stackoverflow.com/q/19605150/7586) - they are not *exactly* what you need (not `grep` questions, and they usually assume you're matching the whole input, not just a part of it), but they answer a very similar case and explain how to combine different requirements. – Kobi Jun 23 '19 at 10:06
  • Do you know [regex101](https://regex101.com/) ? It also gives you an explanation - which may help you understand why your attempt is not working for you. – Abra Jun 23 '19 at 10:15
  • 1
    @Abra online regex helpers are nice, but it also depends on whether they support cli tools, regex101 as far as I know doesn't and BRE/ERE regex features vary significantly compared to those found in programming languages – Sundeep Jun 23 '19 at 12:47
  • Jen, it would help if you clarify if an input line contains only one word or not.. if it is one country per line, `awk` would be better suited, for ex: `awk '/^\w{9}$/ && /i/'` – Sundeep Jun 23 '19 at 12:55
  • If you post sample input and expected output then we can help you. – Ed Morton Jun 23 '19 at 13:44

1 Answers1

1

One way of solving this problem is to use grep twice, and pipe one result to the next.

First, we find all words with length 9, like you did on the previous exercise:

grep -Eo '\b\w{9}\b' countries

I'm using the flag o that lists only the matching words, printing one word per line. Next, we use Linux pipe (not regex OR) to feed the output of the first grep to a second grep:

grep -Eo '\b\w{9}\b' countries | grep 'i'

The final output will be all words with nine characters and i.

Depending on your requirements, this approach may be considered "cheating" if you're more focused on Regex, but a good solution if you're also learning Linux.


The fact you are looking for words complicates the regex (in contrary to lines in the file), but it is also possible to come up with a single regex to match these words.

 \b(?=\w*i)\w{9}\b

This builds on \b\w{9}\b you already have. (?=\w*i) is the AND condition. After we find the beginning of the word (\b), we look ahead for \w*i (zero or more letters, and then our i). We're using \w* in the lookahead, not .*, so we are looking at the same word. (?=.*i) would have matched any i also after the nine characters.
After finding the i, we continue to make sure the word is only 9 letters.

Working example: https://regex101.com/r/G5EVdM/1

Kobi
  • 135,331
  • 41
  • 252
  • 292
  • Sorry, I missed the statement "I tried multiple grep statements as well and it seems the emulator may not accept that." from the question - I see you already tried that one. – Kobi Jun 23 '19 at 11:06
  • fyi: typically, grep/sed/awk/etc are based on BRE/ERE and do not have support for lookarounds, non-greedy, etc. If `GNU grep` is available with PCRE option, then the lookaround solution would work – Sundeep Jun 23 '19 at 12:49
  • @Sundeep - how typical would you say that is? I thought the -E option is pretty common. – Kobi Jun 23 '19 at 13:03
  • By default, grep uses BRE and -E enables ERE. How they behave depends on implementation. For GNU grep, features are same, only syntax differs - for example `\?` in BRE is `?` in ERE. To use PCRE (perl compatible regex), need to use -P option and as far as I know, only GNU grep supports that – Sundeep Jun 23 '19 at 13:27
  • https://unix.stackexchange.com/questions/119905/why-does-my-regular-expression-work-in-x-but-not-in-y would be a good place for more details – Sundeep Jun 23 '19 at 13:28