0

as the title suggests, I would like to create a file search system with criteria. I would like to filter the file name whether it contains a specific sequence of numbers or letters.

example:

CRITERIA WITH WHICH I WOULD FILTER THE RESEARCH

value 1 = FRGHSD02D5102T value 2 = 005878

[file] 00256_FRGHSD02D5102T0013005878.TXT I WANT TO FIND IT

FRGHSD02D5102T00256_0013005878.TXT I WANT TO FIND IT

_FRGHSD02D5102T001300587800256.TXT I WANT TO FIND IT

00058_GHT52DSF56S03U0014002545.TXT I DO NOT WANT TO FIND IT

I tried to get this using the glob () function;

$ files = glob ("... / ..... / *. txt");

so he finds nothing

$ files = glob ("... / ..... / *. txt / * {002} * {001} *. txt");

thanks a lot

fraielito
  • 3
  • 5

1 Answers1

1

The following syntax can be used:

^.*(FRGHSD02D5102T|005878).+$

This would match any line that contains FRGHSD02D5102T or 005878:

00256_FRGHSD02D5102T0013005878.TXT ✅
FRGHSD02D5102T00256_0013005878.TXT ✅
_FRGHSD02D5102T001300587800256.TXT ✅
00256_005878.TXT ✅
FRGHSD02D5102T.TXT ✅
00058_GHT52DSF56S03U0014002545.TXT ❌
00256_005873.TXT ❌

This can be combined with glob to search through all folders and subfolders for the specific pattern:

$folder = __DIR__ . '/data';
$pattern = '/^.*(FRGHSD02D5102T|005878).+$/';

$dir = new RecursiveDirectoryIterator($folder);
$ite = new RecursiveIteratorIterator($dir);
$files = new RegexIterator($ite, $pattern, RegexIterator::GET_MATCH);

foreach($files as $file) {
    echo 'found matching file: ' . $file[0] . PHP_EOL;
}

the folder structure:

data
|-- 00256_FRGHSD02D5102T0013005878.TXT
|-- example.TXT
`-- test
    `-- YES256_FRGHSD02D5102T0013005878.TXT

the result:

found matching file: /Users/stackoverflow/dev/data/00256_FRGHSD02D5102T0013005878.TXT
found matching file: /Users/stackoverflow/dev/data/test/YES256_FRGHSD02D5102T0013005878.TXT

When searching for an specific extension the following snippet can be used:

.pdf

$pattern = '/^.*(FRGHSD02D5102T|005878|001|002).*\.pdf$/';

.txt

$pattern = '/^.*(FRGHSD02D5102T|005878|001|002).*\.TXT$/';

.pdf, .PDF, .PdFm, contains 001 and 002 OR 002 and 001

$pattern = '/^.*(FRGHSD02D5102T|005878|001.*002|002.*001).*\.pdf/i';

matches:

data
|-- 00256_FRGHSD02D5102T0013005878.TXT ❌
|-- example.TXT ❌
|-- hell001hello.pdf ❌
|-- hell001hello002.pdf ✅
|-- hell002hello001.pdf ✅
`-- test
    `-- YES256_FRGHSD02D5102T0013005878.TXT ❌

The /i makes it case-insensitive so it will match any casing of PDF.

The \. escapes the . because we need to match the literal . instead of matching all characters.

MaartenDev
  • 5,631
  • 5
  • 21
  • 33
  • thanks for the answer, can I use this syntax in glob ()? $files = glob ("... / ..... / .*(FRGHSD02D5102T|005878).+$. txt"); – fraielito Sep 05 '19 at 04:04
  • Why are there so many dots? Do they represent an absolute path or is it navigating up the tree? @fraielito – MaartenDev Sep 05 '19 at 07:15
  • actually my goal would be to search in a folder that contains other folders (I don't know the actual number of folders) – fraielito Sep 05 '19 at 16:05
  • For searching in nested folders the following may help: https://stackoverflow.com/questions/17160696/php-glob-scan-in-subfolders-for-a-file @fraielito – MaartenDev Sep 05 '19 at 17:15
  • I added example code for searching using `glob` @fraielito – MaartenDev Sep 05 '19 at 17:57
  • thank you so much. you have been really useful. as soon as I have a moment I try it. – fraielito Sep 05 '19 at 19:39
  • the search under the folders works perfectly, but I have problems with searching for files. in the example you mentioned do not put the file exam I need to find. I tried: /^.*(001|002).+$.txt/ but it doesn't work – fraielito Sep 05 '19 at 19:54
  • Could you give an example file name it should match? Should they always end with `.TXT`? @fraielito – MaartenDev Sep 06 '19 at 09:22
  • exactly the file always ends with a fixed extension. eg hell001hello002.pdf or 002hello001.pdf etc. – fraielito Sep 06 '19 at 12:29
  • You can add the options to: `$pattern = '/^.*(FRGHSD02D5102T|005878).+$/';` such as: `$pattern = '/^.*(FRGHSD02D5102T|005878|001|002).+$/';` @fraielito – MaartenDev Sep 06 '19 at 12:31
  • sorry I didn't express myself well. I wanted to know how to add the extension to the file. in this example $pattern = '/^.*(FRGHSD02D5102T|005878|001|002).+$/' to search for PDF files I add $ pattern = '/^.*(FRGHSD02D5102T|005878|001|002).+$ . PDF/ '; correct? – fraielito Sep 06 '19 at 17:45
  • Added example for extensions for PDF and text. @fraielito – MaartenDev Sep 06 '19 at 18:34
  • 1
    You should replace `…|002).+\.TXT$` by `…|002).*\.TXT$` (`+` replaced by `*`), otherwise you require at least one character between the matched sequence and the extension. – Lienhart Woitok Sep 06 '19 at 18:56
  • Did the provided examples help? @fraielito – MaartenDev Sep 07 '19 at 09:10
  • This syntax works correctly $ pattern = '/^.*(001|002).*\.txt$/'; but it distinguishes between .TXT and .txt. In this case it only finds the .txt file and not .TXT. Then find both the file containing 001 002 but also the file that contains only one of the two values ​​(eg 001), I would like to find only the file that contains both the value 1 (001) and the value 2 (002) . – fraielito Sep 07 '19 at 13:04
  • this should do what I say $ pattern = '/^.*(001).*(002).*(\.txt|\.TXT)$/'; Correct? – fraielito Sep 07 '19 at 13:25
  • Does the `/i` fix the casing issue? @fraielito – MaartenDev Sep 08 '19 at 09:56
  • Glad to hear! Could you mark the provided answer as solution so others can find it? @fraielito – MaartenDev Sep 11 '19 at 15:16
  • Tick the checkmark under the up/downvote arrows, guide: https://meta.stackexchange.com/a/5235 @fraielito – MaartenDev Sep 11 '19 at 16:20
  • @MaartenDev Sorry if placed in this discussion. You can search with glob in a network folder. I have a network unit (WITH SPACES): //DATA/FOLDER/FOLDER TWO/ – fraielito Sep 30 '19 at 16:00
  • Why did un-accept the solution, it solves the original problem you stated. Please mark it as answer again and ask your a new question instead of changing this one. @fraielito – MaartenDev Sep 30 '19 at 18:15
  • sorry I accidentally clicked. I can't make a new application with my account now. – fraielito Sep 30 '19 at 20:44
  • No worries, what part fails when using an external disk? Does it fail matching the file names? It seems to work: https://regexr.com/4m0aa @fraielito – MaartenDev Oct 01 '19 at 07:30
  • sorry as not said. I missed the path. Thanks again. it works properly: //DATA/FOLDER TWO – fraielito Oct 01 '19 at 18:15