-1

I'm writing code that loops through a list of folder uri's and checks to make sure the folder actually exists on the file system. For folders that don't exist, the code will output the uri of the folder onto a webpage.

So, to the issue. The issue that I'm having is that some of the uri's of folders that are being output onto the webpage actually exist. For example, if I copy the uri from the webpage and paste it into File Explorer it locates the folder.

While investigating the issue, I put a breakpoint in the code and grabbed the suspect uri from the code and pasted it into the File Explorer and it couldn't find the folder. So when copying the uri from the webpage, File Explorer finds the folder but when copying the exact same uri from a watch variable in the code, File Explorer cannot find the folder. Below you'll find the uri's

URI of folder output onto webpage

\mycpu\go now\Harden, James  Jr. & Allen\2021

Exact same URI but grabbed from watch variable in code

\mycpu\go now\Harden, James Jr. & Allen\2021

They're exactly the same. But then I tried pasting each one into chrome to see what I get, and here's what I got:

URI of folder output onto webpage and pasted into Chrome

file://mycpu//go%20now//Harden,%20James%20%20Jr.%20&%20Allen//2021/

Exact same URI but grabbed from watch variable in code and pasted into Chrome

file://mycpu//go%20now//Harden,%20James%C2%A0%20Jr.%20&%20Allen//2021/

So, as you can see above, the URI from the watch variable contains some non-breaking characters after the name James. What is the best way, in the code, to expose this underlying encoding so that the non breaking spaces can be removed?

  • 2
    Related: [How to do URL decoding in Java?](https://stackoverflow.com/questions/6138127/how-to-do-url-decoding-in-java) – andrewJames Apr 23 '22 at 18:44
  • Actually the comma should be URL encoded too (`%2C`) AFAIK, older browsers did not do that. I wonder whether in the original path the non-breaking space actually occurred (`\u00A0`)? That seems likely as names often are treated a bit special, like breaking a full name on the normal space. – Joop Eggen Apr 23 '22 at 19:33
  • I don’t understand the problem. Those URIs are correct. The space character is **not allowed** to appear in a URI. It must be percent-encoded. By definition, a URI is not the same as a file path, even though a URI might represent a file path. – VGR Apr 24 '22 at 06:03
  • The link @andrewJames posted helped... it wasn't the solution tho so I'll post the solution. – user2488184 Apr 24 '22 at 06:31
  • @VGR the problem was that one of the spaces in the URI was a non breaking space (%C2%A0) which was only exposed once the URI had been encoded. This non breaking space was preventing the File exists() method from locating the folder represented by the URI – user2488184 Apr 24 '22 at 07:03
  • Does this answer your question? [How to do URL decoding in Java?](https://stackoverflow.com/questions/6138127/how-to-do-url-decoding-in-java) – Alicia Sykes Apr 24 '22 at 15:13

2 Answers2

1

So, after further examination, the only issue in the uri's were the no break spaces (%C2%A0) so I just needed to remove those from the uri's. Thanks to @andrewjames for linking to a question that showed how to expose the special characters which was a critical part of the solution. Here's the solution I came up with.

So, first I passed the uri into a cleanString() method to remove the special characters. The uri, when passed in, had the following format:

\\mycpu\go now\Harden, James Jr. & Allen\2021

Then I encoded the URI into UTF-8 which exposed the underlying special characters. After being encoded, the encodedUri had the following format:

%5C%5Cmycpu%5Cgo+now%5CHarden%2C+James%C2%A0+Jr.+%26+Rene%5C2021

Then I ran a String replace on the encoded URI to replace the substring of no break characters (%C2%A0) with a plus sign (which is a space in UTF-8) and stored the result in the cleanedUri variable which then had the following format:

%5C%5Cmycpu%5Cgo+now%5CHarden%2C+James++Jr.+%26+Rene%5C2021

Lastly, I then decoded the cleanedUri back into the standard user-friendly URI format that we're all used to seeing which looked like this again (which appears exactly as it did initially except that the underlying no break space which made up one of the spaces has been replaced with a basic UTF-8 space).

\\mycpu\go now\Harden, James Jr. & Allen\2021

This did the trick handsomely! Thank you all for your help! Each comment illuminated my path a bit more (see code below).

private String cleanString(String uri){
        try {
            String encodedUri = URLEncoder.encode(uri, StandardCharsets.UTF_8.name());
            String cleanedUri = encodedUri.replace("%C2%A0", "+");
            return URLDecoder.decode(cleanedUri, StandardCharsets.UTF_8.name());
        } catch (UnsupportedEncodingException ex) {
            Logger.getLogger(ParseCSVFileImpl.class.getName()).log(Level.SEVERE, null, ex);
            return uri;
        }
    }
0

There may be a quicker solution that didn't come to my head immediately but the first thing that comes to my mind is creating an array of the special characters, and then looping through the array to remove every one of them, like such:

String[] specialChars = {"!","@","#","$","%","^","&","*","(","}"};
for (String special : specialChars) {
    original.repalceAll(special, ""); // original is your original string
}

The reason the array is of strings and not chars is that using chars would require you to replace special characters with a space, which may or may not be what you want.

  • Test your code. On a side, is there something wrong with just doing: `original = original.replaceAll("[!@#$%^&*(){}\\[\\]]", "");`? No loop needed. – DevilsHnd - 退職した Apr 23 '22 at 20:38
  • Your solution is indeed probably better, i'm not extremely familiar with the pattern matching syntax. – Andrew Wood Apr 24 '22 at 01:08
  • Thanks @DevilsHnd. That's essentially what I did to solve it except that the only special character in the URI that was causing an issue was the non breaking space (%C2%A0) so I encoded the string, replaced the non breaking space, and then decoded the string back – user2488184 Apr 24 '22 at 07:08