2

First of all, I know that there are tons of regex threads here in the stackoverflow and I checked a bunch of it, but it is being really hard to match the correct sentence here.

What I am currently trying to do is a match these characters: - a-z - A-Z - 0-9 - .()~-_[]

Based in this regex, what will be done after is replace all the characters that are not matching here by no space.

The file names that I am using as an example are: - 12345677-fieberthermometer-fuer-schlaefe-und-ohr-digital-mapa-nuk-d0@#$%"&*()!ßöäüÄÜÖ"'][}{<>:;,º.jpg

    private static final String FOLDER = "/path/to/my/folder";
private static final String URL_VALID_REGEX = "a-zA-Z0-9\\.\\(\\)\\[\\]\\-~_";

public static void main(String[] args) {
    File imagesDirPath = new File(FOLDER);
    Pattern p = Pattern.compile("[" + URL_VALID_REGEX + "]");

    final String[] listImages = imagesDirPath.list(new FilenameFilter() {
        @Override
        public boolean accept(File dir, String name) {
            Matcher m = p.matcher(name);

            if (!m.matches()){
                File renamedFile = new File(FOLDER + File.separator + name);
                name = name.replaceAll("[^" + URL_VALID_REGEX + "]", "");
                renamedFile.renameTo(new File(FOLDER + File.separator + name));
            }

            System.out.println(name);

            final String extension = FilenameUtils.getExtension(name);
            final boolean isAcceptedExtesion = getAcceptedFileFormatList().contains(extension);
            final long lastModified = new File(dir, name).lastModified();
            return isAcceptedExtesion;
        }
    });
}

As you can see in the code, the replace for the characters occurs with a negation of the regex for valid, but I'm also not sure if that is how it should be since all the matches are always false.

1st problem: The match is always false even though the file name is correct, which leads to create a new file and change the last modification date, which is important to remain the same

2nd problem: The comma and asterisk always remain in the file name, but this is also probably due the wrong regex

Example of a valid name: - 12345677-fieberthermometer-fuer-schlaefe-und-ohr-digital-mapa-nuk-d0_~()][.jpg

What am I missing here that I am not able to find?

Banns
  • 576
  • 4
  • 12
  • 1
    I think you missed `+`: `Pattern p = Pattern.compile("[" + URL_VALID_REGEX + "]+");`. `m.matches()` requires a full string match. – Wiktor Stribiżew Apr 05 '18 at 09:05
  • Indeed this was what was missing, I can see that now the values are true after the plus sign. Regex drives me crazy because one character can change the whole outcome haha. Thanks for the tip! – Banns Apr 05 '18 at 09:13
  • On second thoughts, `*` modiifer is better in this scenario. Well, anyway, check what works best for you, `+` or `*`. – Wiktor Stribiżew Apr 05 '18 at 09:13
  • What would be the difference in using the + and *? If I well record, the + will need at least 1 string while the * can be none, is that right? – Banns Apr 05 '18 at 09:16
  • When you use `+`, empty string will be processed with `replaceAll`. Probably that is not an issue if you have an empty string check before. – Wiktor Stribiżew Apr 05 '18 at 09:18

1 Answers1

-1

I reproduced it in notepad++ but I will try to do it in java.

There are few problems. You should match against the problematic characters and if such are found replace them. Use find() instead of matches because you don't need the whole string to match.

In notepad I just replace [^a-zA-Z0-9.()[]\-~_]+ with "" and I get what you want.

In java

import java.util.regex.*;

public class HelloWorld{

public static void main(String[] args) {
     String wrong="12345677-fieberthermometer-fuer-schlaefe-und-ohr-digital-mapa-nuk-d0@#$%\"&*()!ßöäüÄÜÖ\"'][}{<>:;,º.jpg";
     String pattern="[^a-zA-Z0-9\\.\\(\\)\\[\\]\\-~_]+";
     Pattern p = Pattern.compile(pattern);
     Matcher m = p.matcher(wrong);
     if (m.find()){
                right = wrong.replaceAll(pattern, "");
                System.out.println(right);
     }
  }
}
Veselin Davidov
  • 7,031
  • 1
  • 15
  • 23
  • The answer is not relevant. *Every file matches against that because every file has at least one matching character.* - Wrong, `[a-z]` with `matches` only matches a string that has just 1 char. *So you should match against the problematic characters and if such are found replace them* - OP does exactly that, see *`name = name.replaceAll("[^" + URL_VALID_REGEX + "]", "");`*. *you need to escape the "-"* - OP pattern contains an escaped `-`. – Wiktor Stribiżew Apr 05 '18 at 09:12
  • Sorry maybe I explained myself wrong. I fixed the answer now and added a working java example – Veselin Davidov Apr 05 '18 at 09:19
  • You are just re-inventing OP code. That step is already working in OP code. You do not need `+` in the pattern used with `replaceAll` although it is more logical. But not required. – Wiktor Stribiżew Apr 05 '18 at 09:19
  • it's not required because it will replace each character found but makes the regex more readable and I beleive (but not sure) it will have better performance - at least in my head – Veselin Davidov Apr 05 '18 at 09:25
  • Maybe I am reinventing but mine works ;) and OP doesn't... – Veselin Davidov Apr 05 '18 at 09:26
  • Sure it works, but it is not relevant, OP replacing code also works well. Validating one does not. – Wiktor Stribiżew Apr 05 '18 at 09:27
  • what doesn't work in his code is that he uses matches instead of find. And it's not the whole string that matches but he needs to find wrong chars. Anyway I don't think I deserve that down vote since I tried to help and my code actually works ;) but who cares anyway – Veselin Davidov Apr 05 '18 at 11:20