0

hi i am working on a shellscript.. suppose this is the data my shell script runs on

      Ownership
               o Australian Owned
   ?
   Ads for Mining Engineers
   232 results for
mining engineers in All States
   filtered by Mining Engineers [x] category
     * [ ]
                    [34]get directions
       Category:
       [35]Mining Engineers
       [36]Arrow Electrical Services in Wollongong, NSW under Mining
       Engineers logo
            [37]email
            [38]send to mobile
            [39]info
            Compare (0)
     * [ ]
       . [40]Firefly International
       Designers & Manufacturers. Service, Repair & Hire.
       We are the provider of mining engineers in Mt Thorley, NSW.
       25 Thrift Cl, Mt Thorley NSW 2330
       ph: (02) 6574 6660
            [41]http://www.fireflyint.com.au
            [42]get directions
       Category:
       [43]Mining Engineers
       [44]Firefly International in Mt Thorley, NSW under Mining Engineers
       logo
            [45]email
            [46]send to mobile
            [47]info
            Compare (0)
     * [ ]
       [48]Materials Solutions
       Materials Research & Development, Slurry Rheology & Piping Design.
       We are a well established company servicing the mining industry &
       associated manufacturing industries in all areas.
       Thornlie WA 6108
       ph: (08) 6468 4118
            [49]www.materialssolutions.com.au
       Category:
       [50]Mining Engineers
       [51]Materials Solutions in Thornlie, WA under Mining Engineers logo
            [52]email
            [53]send to mobile
            [54]info
            Compare (0)
     * [ ]
       . [55]ATC Williams Pty Ltd
       Our services are available from concept to completion of the works.
       Today, as the rebranded ATC Williams, we continue to expand our
       operations across Australia and in locations around the world.
       Unit 1, 21 Teddington Rd, Burswood WA 6100
       ph: (08) 9355 1383
            [56]www.atcwilliams.com.au
            [57]get directions
       Category:
       [58]Mining Engineers
       [59]ATC Williams Pty Ltd in Burswood, WA under Mining Engineers
       logo
            [60]email
            [61]send to mobile
            [62]info
            Compare (0)

and i need to grab addresses that look like this

 * [ ]
       . [55]ATC Williams Pty Ltd
       Our services are available from concept to completion of the works.
       Today, as the rebranded ATC Williams, we continue to expand our
       operations across Australia and in locations around the world.
       Unit 1, 21 Teddington Rd, Burswood WA 6100
       ph: (08) 9355 1383
            [56]www.atcwilliams.com.au

so what do i do.. i've been working on regular expressions like

^*(.?[\w\W?\s?]*)+(.com.au)$

but thats not helping.. it matches the address when i give the input file with the address match i want.. but when given in bulk, it doesnt help. so can somebody help me out..

Kiran Vemuri
  • 2,762
  • 2
  • 24
  • 40
  • and if i use such long expression, my grep is searching for specific characters for metacharacters.. like if i mean \s as any space, it searches for the letter 's' :( – Kiran Vemuri Jun 13 '12 at 07:23

2 Answers2

1

I see some issues with your regex

^*(.?[\w\W?\s?]*)+(.com.au)$
 ^ ^           ^ ^ ^   ^
 1 1           2 2 1   1
  1. special char's that need escaping

  2. greedy quantifier that match everything till the last ".com.au", add a ? after the quantifier to make it ungreedy ==> match as less as possible (means till the first ".com.au" that is found at the row end).

    ==> This is your main problem

  3. You nest quantifiers *)+, you don't need that

  4. In your example there is whitespace between the "*" and the ".", so either match for whitespace or remove the dot at all, it will be matched by your character class.

  5. There is also whitespace between the start of the row and the "*"

So, try this

    ^\s*\*([\w\W?\s?]*?)(\.com\.au)$

See it here on Regexr

Community
  • 1
  • 1
stema
  • 90,351
  • 20
  • 107
  • 135
  • nope.. that isn't working.. my latest try being
    \*(\.?[\w\W?\s?]*)+([\w\W\s\d]*)?([\W\w]*\.\a\u)*$
    and this is actually matching all the text while i only want it to match the addrersses part!
    – Kiran Vemuri Jun 13 '12 at 06:56
  • and if i use such long expression, my grep is searching for specific characters for metacharacters.. like if i mean \s as any space, it searches for the letter 's' :( @stema – Kiran Vemuri Jun 13 '12 at 07:23
0

Try this

^\s*\*\s*\[ \][^\*]+?[.]com[.]au$

explanation

^        # Assert position at the beginning of a line (at beginning of the string or after a line break character)
\s       # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *        # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\*       # Match the character “*” literally
\s       # Match a single character that is a “whitespace character” (spaces, tabs, and line breaks)
   *        # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\[       # Match the character “[” literally
\        # Match the character “ ” literally
\]       # Match the character “]” literally
[^\*]    # Match any character that is NOT a * character
   +?       # Between one and unlimited times, as few times as possible, expanding as needed (lazy)
[.]      # Match the character “.”
com      # Match the characters “com” literally
[.]      # Match the character “.”
au       # Match the characters “au” literally
$        # Assert position at the end of a line (at the end of the string or before a line break character)
Cylian
  • 10,970
  • 4
  • 42
  • 55
  • that actually looks good! but when i try using grep with that expression $ grep '^\s*\*\s*\[ \][^\*]+?[.]com[.]au$' file it gives me no results. but when i use it in a regex tester.. it surely works.. so can you tell me whats the problem there? @Cylian – Kiran Vemuri Jun 13 '12 at 07:41
  • @KiranVemuri: You've forget to escape one `*` just after `^\s*` in your command ``$ grep '^\s**\s*[ ][^*]+?[.]com[.]au$``. Change it to ``$ grep '^\s*\*\s*[ ][^*]+?[.]com[.]au$`` or ````$ grep '^\s*\*\s*[ ][^*]+?\.com\.au$``. Hope this works. – Cylian Jun 13 '12 at 07:51
  • its the same as i said.. this expression works super fine when i try in a regex tester but when i grep it in my linux terminal.. it gives me no result:( – Kiran Vemuri Jun 13 '12 at 08:03
  • and i can't even use the regular expression in libreoffile's search provisionj – Kiran Vemuri Jun 13 '12 at 08:13
  • hey can you modify the regular expression to match the following text also `code * [ ] . [55]ATC Williams Pty Ltd Our services are available from concept to completion of the works. Today, as the rebranded ATC Williams, we continue to expand our operations across Australia and in locations around the world. Unit 1, 21 Teddington Rd, Burswood WA 6100 ph: (08) 9355 1383 ` – Kiran Vemuri Jun 13 '12 at 09:18
  • I dint get you. Please update your question providing some sample input and output. – Cylian Jun 13 '12 at 09:24
  • Try syntax ``(REGEX_PATTERN)``. – Cylian Jun 13 '12 at 09:39
  • okay.. let me put it this way! im pretty new to linux and shell scripting.. i coded a shell script to grab addresses from a website to a file using "lynx" now the data i got is full of garbage values.. i want to extract the addresses out of it .. so i resorted to regular expressions and i reframed the regular expression i got from you into "(^\s*\*\s*\[ \][^\*]+?(\w*\:[^\*]+\d$)|(\[[^\*]+[.]com[.]au$)" when i try this regex in www.regextester.com it is working fine.. but when i use it with grep command in linux, it gives me empty file.. so what do i do? can you help me out? – Kiran Vemuri Jun 13 '12 at 10:35