51

I'm a bit new to regex and am looking to search for multiple lines/instaces of some wildcard strings such as *8768, *9875, *2353.

I would like to pull all instances of these (within one file) rather than searching them individually.

Any help is greatly appreciated. I've tried things such as *8768,*9875 etc...

gfuller40
  • 1,183
  • 9
  • 19
  • 36
  • 1
    I have no problem with regex in general, but here I'm not exactly sure about what you want. Could you please give an example of data input and what's your expected output? – ccjmne Jan 08 '14 at 20:49
  • I'm simply trying to pull all lines of text from a .DAT (or .txt) file that contain a substring of the above #'s. Basically (in SQL terms) I'm trying to do: Select * from table where column in(*8678,*9875) – gfuller40 Jan 08 '14 at 20:52

3 Answers3

80

If I understand what you are asking, it is a regular expression like this:

^(8768|9875|2353)

This matches the three sets of digit strings at beginning of line only.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • I'm testing using a txt file with numbers 1-10 (all on their each line and this will not work. Is there an edit that will grab the beginning of a line? – gfuller40 Jan 08 '14 at 20:57
  • @gfuller40: What do you mean by "grab the beginning of the line"? – wallyk Jan 08 '14 at 20:59
  • You specified end of line only is that like a right substring function? these #'s are the very first characters per line. – gfuller40 Jan 08 '14 at 21:03
  • @gfuller40: I didn't understand what you were asking before. The `*` in your question suggests that the matching strings can be anywhere and the lack of a splat at the end suggested (to me) the fixed strings were at the end of something. I have edited my answer. – wallyk Jan 08 '14 at 21:10
  • I'm thinking textpad does not like these formats. I'm not getting an error, it's just not finding the strings. I've used other regex such as trim after ^.*:[ \t] and these work fine. – gfuller40 Jan 08 '14 at 21:16
  • @gfuller40: I didn't realize until now that Textpad is a Microsoft product which means they think REs are treated with some contempt. Either enable PCRE (Perl-compatible regular expression) mode if you can, or rewrite as `^\(8768\|9875\|2353\)` for the most common variant of regular expression which it [seems to use](http://www.textpad.com/support/faq/retools.html). – wallyk Jan 08 '14 at 21:44
  • BAM you nailed it. In ultra edit I could check "Use Perl" in textpad "Use POSIX" was the winner. Thanks guys. Also acdcjunior - your syntax was correct and worked with these settings. I did try ^\(8768\|9875\|2353\) and that did not seem to work without settings enabled but this is resolved. THANKS Wallyk! – gfuller40 Jan 09 '14 at 15:55
43

To get the lines that contain the texts 8768, 9875 or 2353, use:

^.*(8768|9875|2353).*$

What it means:

^                      from the beginning of the line
.*                     get any character except \n (0 or more times)
(8768|9875|2353)       if the line contains the string '8768' OR '9875' OR '2353'
.*                     and get any character except \n (0 or more times)
$                      until the end of the line

If you do want the literal * char, you'd have to escape it:

^.*(\*8768|\*9875|\*2353).*$
acdcjunior
  • 132,397
  • 37
  • 331
  • 304
  • This does not seem to work either. I'm testing with a text file with numbers 1-10 (on seperate lines) and I'm not getting any occurences using ^.(*01|04|08).*$ – gfuller40 Jan 08 '14 at 21:02
  • 3
    `^.(*01|04|08).*$` is wrong. Use the one in my answer or change the place of the `*` (pull it out of the `()`s), like this: `^.*(01|04|08).*$`. – acdcjunior Jan 08 '14 at 21:04
  • Ok I'm using my original example with the actual 4 digin #'s on seperate lines. I'm still not getting any occurences. Thank you very much for the explanation, however, that is very helpful for sure, I just wish I could figure this out. I'm using ^.*(8768|9875|2353).*$ – gfuller40 Jan 08 '14 at 21:08
  • Take a look how this regex matches: http://regexr.com?37u1n -- Maybe you could place the content of your file and test some lines. Also, there may be an issue with the way you are using regexes in your tool. – acdcjunior Jan 08 '14 at 21:12
  • 1
    Do you want to match the literal `*` or just the number? Have you tried: `^.*(\*01|\*04|\*08).*$` ? – acdcjunior Jan 08 '14 at 21:19
  • I have tried both... Gskinner this is working (as you know). It must be a variation with textpad and how it reads the expressions. – gfuller40 Jan 08 '14 at 21:31
  • What version of textpad are you using? (You are ticking the regular expression checkbox http://i31.tinypic.com/2bb22q.png right? :) Other simpler expressions, like `\d+` (any sequence of digits) work? – acdcjunior Jan 08 '14 at 21:35
  • IF explanation === 'awesome' THEN return $THUMBS_UP ENDIF; – Syed Aqeel Feb 20 '19 at 05:53
0

I suggest much better solution. Task in my case: add http://google.com/ path before each record and import multiple fields.

CSV single field value (all images just have filenames, separate by |):
"123.jpg|345.jpg|567.jpg"

Tamper 1st plugin: find and replace by REGEXP: pattern: /([a-zA-Z0-9]*)./ replacement: http://google.com/$1

Tamper 2nd plugin: explode setting: explode by |

In this case you don't need any additinal fields mappings and can use 1 field in CSV

Alex
  • 1