2

I have an old Access query which I'm trying to convert to Oracle SQL. Part of it looks at a string which can have a whole bunch of text and another part of the string is a series of five characters, e.g.:

NNNNN

What I'm trying to do is find where any of those characters are Y but only when in the specific format of being within 5 characters. For example, the whole string might be:

The quick brown fox jumps over the lazy dog NNNNN

I don't want to return this because the five NNNNN does not contain Y.

The current query does something like this:

SELECT *
FROM foo
WHERE
(
bar LIKE '%Y____%' OR 
bar LIKE '%_Y___%' OR
bar LIKE '%__Y__%' OR 
bar LIKE '%___Y_%' OR
bar LIKE '%____Y%'
)

However, I think this could be better achieved with a single REGEXP_LIKE statement. How could I do this?

WSC
  • 903
  • 10
  • 30
  • I think you need to provide more sample data and a better explanation of the format you are looking for. How do you distinguish "5 characters" from "a whole bunch of text"? – Gordon Linoff Mar 01 '19 at 15:08
  • Yeah, I take your point. I've added some more context and clarified the expected result. – WSC Mar 01 '19 at 15:15
  • EVERYTHING in a string is a "character" - including the "space" character. You need to do a better job of explaining your requirement. Your "clarification" doesn't clarify anything. In your example, 'quick' is also a substring of five letters surrounded by spaces, just like 'NNNNN' - so why do you single out 'NNNNN'? –  Mar 01 '19 at 15:15
  • @mathguy - While there are still questions about the input format, it is not true that OP didn't make the desired result clearer. You're reading way too much into the fact that the posted code snippet is buggy. – Mark Adelsberger Mar 01 '19 at 15:19
  • The solution that would save you much more hassle down the road would be to split that column into five different columns that represent what each of the yes/no values actually mean. – Andy Lester Mar 01 '19 at 15:48
  • @AndyLester I wish that was possible. I wouldn't have designed the system to use random five character strings at all, but legacy stuff... – WSC Mar 01 '19 at 15:54
  • @WSC Understood. Been there, gritted my teeth through that. – Andy Lester Mar 01 '19 at 16:55

4 Answers4

2

Can't you do something like this?

where bar like '%Y%' and length(b) >= 5

This is basically your logic and doesn't require regular expressions.

If you are looking specifically for 5 characters that are all Ns except for 1Y, then I would expect your like solution to be:

where bar like '%YNNNN%' or bar like '%NYNNN%' or . . . 

A simple regular expression version of this is not obvious to me.

One method that comes close is:

where regexp_like(bar, '[YN]{5}') and    -- has a substring with 5 characters, all of which are Y and N
      not regexp_like(bar, 'Y[N]{0-3}Y'  -- has no substring with Y followed by 0-3 Ns and another Y

This could fail on some words in the other text. However, Y followed by 0 or more Ns followed by Y is highly unusual in English words.

Of course, there is the obvious as well:

where regexp_like(bar, 'YNNNN|NYNNN|NNYNN|NNNYN|NNNNY')
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • The value, bar, might some thing like this `YABFHDHGDV NNNNN'`; I want to only return the record when the `NNNNN` contains `Y` rather than just `Y` anywhere in the string. – WSC Mar 01 '19 at 15:10
  • @WSC - I understand that's the logic you're requesting, but I think GordonLinoff's point is, it's *not* the logic of the code snippet you posted. – Mark Adelsberger Mar 01 '19 at 15:14
  • 1
    @WSC - do you understand that the query you show in the post, with the five `LIKE` conditions separated by `OR`, **does not do** what you seem to want? It does **exactly** the same thing as Gordon's solution. Your problem statement is totally unclear - are you looking for a sequence of five consecutive Y letters? Or what else? No one can suggest a good solution if you aren't able to state the problem clearly, in plain English (don't use any code for that). –  Mar 01 '19 at 15:14
  • 2
    @mathguy - Apart from being unnecessarily rude and wordy (notice how my comment conveys much of the same content while being neither of those things), you're wrong. Good problem statements on SO *should* include code when it is useful. – Mark Adelsberger Mar 01 '19 at 15:16
  • 2
    @MarkAdelsberger - I believe you are confusing "good questions" (meaning the entire post) and "good PROBLEM STATEMENTS" - which should always be formulated before the first line of code is written. –  Mar 01 '19 at 15:17
  • @MarkAdelsberger Perhaps I've misunderstood Gordon's answer. Is `len(b)` not the length of string `bar `? If so, it would return for any string with a Y and the whole string length is >= 5, whereas I only want to return when a Y exists within a group of 5 other characters. If not, what is it? I've updated my question with more context anyway. – WSC Mar 01 '19 at 15:19
  • @WSC - Yes, len is the length. And the only thing validated by the like expression you posted is that the total string length is at least 5. – Mark Adelsberger Mar 01 '19 at 15:20
  • 1
    @mathguy - What you believe about my mental state is noted, but until you can start discussing the situation in a civil manner I'm done addressing your comments. – Mark Adelsberger Mar 01 '19 at 15:21
  • 1
    @WSC - that is a mistake in Gordon's answer - I will edit his answer to correct it. The name of the function in Oracle is `length`, not `len`. –  Mar 01 '19 at 15:24
1

As noted elsewhere, the code snippet you posted does not actually zero in on the 5 characters you want to examine. Whether that's because it was transcribed incorrectly, or because the code never worked as intended, or something else, I cannot say. but as written it only says that somewhere in the string is a Y surrounded by other characters, so that there's a total of at least 5 characters.

WHY does this match NNNNN

would meet that criteria, because the 3rd character in the string is a Y surrounded by some other characters, such that the total is at least 5.

If you mean that you're always looking at the last 5 characters - if tre block of Ns and Ys is at the end of the string - then your original code would work if it removed the trailing % from each pattern.

In that case, it might be easier to take the last 5 characters of the string (with a substring function as provided by your current DBMS) and look for any Y in that substring. IN that case, if you really want to use a regex you would just look to match "Y" anywhere in the subdtring, but that's probably overkill.

In general this doesn't seem that well suited to a regex solution IMO

Mark Adelsberger
  • 42,148
  • 4
  • 35
  • 52
  • Thank you for the comprehensive answer. I copied the logic directly from the Access query (which uses `*?Y???*`), and it does seem to work, but I wonder if Access doesn't treat spaces as characters or something. I will play around with your and Gordon's answers and see what I can get working. – WSC Mar 01 '19 at 15:39
  • @WSC - Well, I don't know the explanation for what you're observing, but even if a ' ' in Access didn't match `?` (which I'm 99% sure is not the case anyway), that would still match any string with a 5-letter word containing Y... In any case, good luck – Mark Adelsberger Mar 01 '19 at 16:00
1

Try this WHERE CLAUSE:

where regexp_like(regexp_substr(bar,'[YN]{5}'),'Y')    

The example below shows that it returns only those records from the table "foo" where the string of (Ys or Ns) contains a "Y".

select * from foo;
BAR
--------------------------------------------------
The quick brown fox jumps over the lazy dog YNNNN
The quick brown fox jumps over the lazy dog NYNNN
The quick brown fox jumps over the lazy dog NNYNN
The quick brown fox jumps over the lazy dog NNNYN
The quick brown fox jumps over the lazy dog NNNNY
The quick brown fox jumps over the lazy dog NNNNN
The quick brown fox jumps over the lazy dog NNNNN
The quick brown fox jumps over the lazy dog NNNNN
The quick brown fox jumps over the lazy dog NNNNN
The quick brown fox jumps over the lazy dog NNNNN

10 rows selected.
select * from foo where regexp_like(regexp_substr(bar,'[YN]{5}'),'Y');
BAR
--------------------------------------------------
The quick brown fox jumps over the lazy dog YNNNN
The quick brown fox jumps over the lazy dog NYNNN
The quick brown fox jumps over the lazy dog NNYNN
The quick brown fox jumps over the lazy dog NNNYN
The quick brown fox jumps over the lazy dog NNNNY

5 rows selected.
C Mac
  • 41
  • 4
  • Ah, this is brilliant! I'll need to test with my actual data, but this seems to do exactly what I need. I hadn't thought about using regexp_substr to narrow the regexp_like criteria to the five Y/N section first. – WSC Mar 03 '19 at 12:13
0

the user is asking for continuous 5 occurences of all similar characters with one of them replacing Y and not just N.This is one of the solution:

select * from foo where
regexp_like(bar,'%Y(A{4}|B{4}|C(4)....Z{4}%)') or
regexp_like(bar,'%(A{4}|B{4}|C(4)....Z{4})Y%') or
regexp_like(bar,'%(A{1}|B{1}|C(1)....Z{1})Y(A{3}|B{3}|C{3)....Z{3})%') or
regexp_like(bar,'%(A{2}|B{2}|C(2)....Z{2})Y(A{2}|B{2}|C{2}....Z{2})%') or
regexp_like(bar,'%(A{3}|B{3}|C(3)....Z{3})Y(A{1}|B{1}|C{1}....Z{1})%');
Nikhil S
  • 3,786
  • 4
  • 18
  • 32