1

I'm trying to find files that do not match (at the beginning of the filename) predefined formats contained in a .txt file.

I have the following:

@Echo off
chcp 1254>nul

setlocal DisableDelayedExpansion

    for /f "usebackq tokens=1,2,3* delims=~" %%f in ("%USERPROFILE%\Desktop\xref.txt") do (
    set "DIRNAME=%%f"
    set "DIRNAM2=^%%f"
    set "PATHNAM=%%h"
    set "ALBUMNM=%%g"
    SETLOCAL EnableDelayedExpansion
    IF EXIST !PATHNAM!!DIRNAME! (
    PushD !PATHNAM!!DIRNAME!
    dir /b /a-d "*" | findstr /v /r /c:"!DIRNAM2! -*"
    )
    ENDLOCAL
    )
    pause
    EXIT /b

This works great except with filenames containing bangs (exclamation points).

Here's a sampling of my .txt file (subdirectory~album name~path) which gets generated by a script:

12 Byzantine Rulers. The History of The Byzantine Empire~12 Byzantine Rulers. The History of The Byzantine Empire~g:\test\
17th Century Poetry~17th Century Poetry~g:\test\
1984 (George Orwell)~1984 (George Orwell)~g:\test\
1_2_1~1_2_1~g:\test\
21st Century American Foreign Policy~21st Century American Foreign Policy~g:\test\
99% Invisible~99% Invisible~g:\test\
Communication Matters. That’s Not What I Meant!~Communication Matters. That’s Not What I Meant!~g:\test\

There are hundreds of directories containing hundreds of files (podcasts). I'd like to fix this batch so it can also handle bangs (!).

Thx in advance.


Edit. My test data wasn't robust enough. The findstr command also doesn't work with (at least) the following characters: é’»¿ ... that is to say PushD gets me to the right directory, but FindStr doesn't do it's culling as expected.

RKO
  • 161
  • 10
  • 1
    Why are you bothering with setting the `FOR` variables to Environmental Variables. You could just use the `FOR` variables directly with all of your code inside the code block. – Squashman Feb 18 '20 at 17:49
  • or at the very least, define your variables wthin the [tag:for-loop] construct, then do the rest outside of it! _(Then there'll be no need to use delayed expansion either, but you'd still have the variables to work with elsewhere as needed.)_ – Compo Feb 18 '20 at 18:02
  • Don't have a good reason except that's what it took for the batch to work with PushD and findstr ... not expanding didn't work with either of those two commands. – RKO Feb 18 '20 at 18:45
  • 1
    Strange. I don't see why it should fail. Or do you have delayed expansion enabled on the command lline as default, via registry? Do you can build a minimal example, with one file and one filter? – jeb Feb 18 '20 at 18:55
  • The `IF EXIST` command will not work correctly because there are spaces in your data. It is best practice to ALWAYS put quotes around file names and file paths to protect spaces and special characters. – Squashman Feb 18 '20 at 19:02
  • Thanks all ... but no joy. IF EXIST did indeed need quotes with not enabled ... but quotes plus disabled caused the batch to skip over all the directories. – RKO Feb 18 '20 at 19:22
  • By the way it just came to my memory that I had troubles with FindStr years ago concerning special characters. My memory aint what it used to be, but maybe this will help others to remember. – RKO Feb 18 '20 at 19:30
  • We cannot see what you are attempting to change. If you don't update your question it will be very difficult to help you any further. – Squashman Feb 18 '20 at 19:40
  • I appreciate your comments so far, but they have not yet solved my key issue (failing with non-ascii characters). I've tried my best to eliminate enabled following suggestions, but with no success. – RKO Feb 18 '20 at 20:22
  • Thanks for the response compo but I believe you must be responding to the wrong post. I have no curly quotes and I have nothing but double %. The .txt file does indeed have the issues you mention, but that's not up to me. That's the fault of the producer of the podcast and out of my control (nor will I change them). – RKO Feb 18 '20 at 21:55
  • I don't think that the issue is necessarily code page or encoding related, and less so exclamation marks, _(bangs)_. The major issue I see is that your text file content uses smart quotes, _(curly)_, instead of dumb quotes, _(straight)_. Additionally you have `%` characters which in batch files usually require doubling. For those reasons I would first suggest that you try to replace those characters under delayed expansion, just before your `IF EXIST`. I've deleted, then repeated my comment, because those issues are fixable in your batch file. Hold on, I'll post an example... – Compo Feb 18 '20 at 22:05
  • Those characters that `findstr` chokes on are like `` which is a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) used for UTF-8 files with a BOM. See [Batch script remove BOM () from file](https://stackoverflow.com/questions/52272705/batch-script-remove-bom-%C3%AF-from-file). – michael_heath Feb 19 '20 at 10:37
  • Thanks Michael but I can't remove these characters. I need to deal with them since I have no control over them. These are in the directory names that the podcasters have chosen (and are not changeable with gPodder, my podcast downloader of choice). I'm checking that my directory structure is consistent with the podcaster's. – RKO Feb 19 '20 at 14:00
  • @RKO, gPodder uses Python3, which includes sqlite3 which can read the sqlite3 database at `...\My Documents\gPodder\Database`. Or you could perhaps use batch-file with [sqlite3.exe](https://www.sqlite.org/download.html). You can get podcast, episodes, titles, filenames etc. information selectively this way, so do not know how you got `xref.txt` and whether it is suitable, as to why perhaps you need to filter it and handle text file BOMs etc. Perhaps something for you to research into more. Example: All download directories in sqlite3 would be `select download_folder from podcast;`. – michael_heath Feb 20 '20 at 13:31
  • Thanks Michael, but that's not an option. I'm quite experienced with gPodder and with handling of BOMs. – RKO Feb 22 '20 at 12:05
  • If I got it right, you are trying to find files whose names do not begin with their parent directory name plus space plus hyphen – right? if so, I would definitely not use `findstr` in regular expression mode (`/R`) since it will interprete certain character sequences not the way you expect; rather I would remove `/R` and put `/B` instead, then you do not even need the variable `DIRNAM2` with the leading `^`. And something else: have you already tried with `chcp 437`? I mean, I am not sure whether `findstr` treats some extended characters (like the `’` in your last sample line) particularly... – aschipfl Mar 24 '20 at 22:25
  • Thanks aschipfl ... you were on the right track. Given the resounding lack of response on this question I started a new one following a completely different approach to the task. stackoverflow.com/questions/60835890 ... the solution to that question was indeed use of chcp 65001 – RKO Mar 24 '20 at 23:15

1 Answers1

0

I don't think that the issue is necessarily code page or encoding related, and less so exclamation marks, (bangs). The major issue I see is that your text file content uses smart quotes, (curly), instead of dumb quotes, (straight). Additionally you have % characters which in batch files usually require doubling. For those reasons I would first suggest that you try to replace those characters.

For example:

@Echo Off
SetLocal DisableDelayedExpansion
For /F "UseBackQ Tokens=1-3 Delims=~" %%G In ("%USERPROFILE%\Desktop\xref.txt")Do (
    Set "SUBDIRN=%%G"
    Set "ALBUMNM=%%H"
    Set "PATHNAM=%%I"
    SetLocal EnableDelayedExpansion
    Set SUBDIRN=!SUBDIRN:%%=%%%%!
    Set ALBUMNM=!ALBUMNM:%%=%%%%!
    Set PATHNAM=!PATHNAM:%%=%%%%!
    Set SUBDIRN=!SUBDIRN:’='!
    Set ALBUMNM=!ALBUMNM:’='!
    Set PATHNAM=!PATHNAM:’='!
    Set SUBDIRN=!SUBDIRN:“="!
    Set ALBUMNM=!ALBUMNM:“="!
    Set PATHNAM=!PATHNAM:“="!
    Set SUBDIRN=!SUBDIRN:”="!
    Set ALBUMNM=!ALBUMNM:”="!
    Set PATHNAM=!PATHNAM:”="!
    If Exist "!PATHNAM!!SUBDIRN!\" (
        PushD "!PATHNAM!!SUBDIRN!"
        Dir /B/A-D|FindStr /IVRC:"^!SUBDIRN! -"
    )
    EndLocal
)
Pause
Exit /B

I'm not sure how a copy of this code, within the code box will handle the smart quotes, but I'm sure you'll get the idea.

Compo
  • 36,585
  • 5
  • 27
  • 39
  • If Exist and PushD require the original names (as determined by the podcast publisher) ... not possible to replace characters in these names. Only the album name is in my control. – RKO Feb 18 '20 at 23:01
  • So are you telling me that, your filenames actually contain those characters, i.e. in your example above a single smart quote, and the possibility of a name with a single or double smart quote for feet and inches, or I suppose a set of smart doublequotes, within a filename... – Compo Feb 18 '20 at 23:19
  • Perhaps I responded too quickly before. You are right in that curly quotes are causing problems. The filenames in the .txt file are of the format album - title. So I ran a test manually replacing curly quotes. Still no joy. FindStr still doesn't match in cases there the albumname is missing the bang (exclamation point) ... But again I have no control over path or subdirectory. – RKO Feb 18 '20 at 23:26
  • 1
    Well, for a proper debug on that issue, as already requested [here](https://stackoverflow.com/questions/60286676/batch-variable-exclamation-points-used-in-dir-findstr/60290178?noredirect=1#comment106642461_60286676) and [here](https://stackoverflow.com/questions/60286676/batch-variable-exclamation-points-used-in-dir-findstr/60290178?noredirect=1#comment106641312_60286676), we need you to [edit your question](https://stackoverflow.com/posts/60286676/edit), to provide us with more of your code, and preferably some actual results, together with your expectations. – Compo Feb 18 '20 at 23:58