1

The following algorithm find all documents that has a <password> entry in the info(string in XML format) field

db.getCollection('products').find({info:{$regex: /<password>/}});

but passwordRecords has 0 elements. Where did i make a mistake? Is it right way to update data in Mongo?

Anatoly
  • 1,908
  • 4
  • 25
  • 47
  • [Do not use regex delimiters in C# regex](http://stackoverflow.com/questions/31560080/removing-all-non-word-characters-with-regex), remove `/` from the C# regex pattern. – Wiktor Stribiżew Mar 18 '16 at 11:12
  • `var updatedString = Regex.Replace(r.info, @"/<(.*?password)>([^<]+)<\/(.*?password)", "<$1>$3>"); ` I also need to delete `/` from the beginning? – Anatoly Mar 18 '16 at 11:15
  • If the `/` is a slash that you want to match in the input string, no, it should be there then. What text are you trying to match? Your `asdfghj` can be matched with `(?s).*?` and the updatedString can be hardcoded as `` – Wiktor Stribiżew Mar 18 '16 at 11:27
  • Text where `info` have tags which name CONTAIN password (like `adminPassword`, `password`, `PasswordOfUser`)? Then I want to destroy values of all of this tags. – Anatoly Mar 18 '16 at 11:31
  • @WiktorStribiżew Can you please give the answer to this question with final version (not in the comment)? – Anatoly Mar 18 '16 at 11:32

1 Answers1

2

There are some things to consider here.

  • Do not use regex delimiters in C# regex (remove the outer /.../)
  • If the / is a slash that you want to match in the input string, it should be in the pattern
  • Your strings can be matched with (?si)<([^\s<]*password[^\s<]*)>.*?</\1> pattern and replaced with <$1></$1>: Regex.Replace(r.info, @"(?si)<([^\s<]*password[^\s<]*)>.*?</\1>", "<$1></$1>");

The pattern I suggest contains 2 things of interest:

  • (?si) - DOTALL (Singleline) mode forcing a . to match a newline, too, and it also enables a case-insensitive match mode
  • ([^\s<]*password[^\s<]*) - captures a node name containing password (where the node has only a name, no attributes)
  • .*? - matches any 0+ characters, as few as possible up to the next required subpattern
  • </\1> - matches the closest corresponding tag matched with Group 1. Thus, this is not going to match any nested tags.

See regex demo

enter image description here

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Why not "hardcode" it as ``? Why do you need to use `Regex.Replace` if you are not accessing/using the text in-between (the innerHTML)? – Wiktor Stribiżew Mar 18 '16 at 11:56
  • As I said before, `r.Info` it's a big string(in xml format) and I need to replace(or change) value of tags like (`adminPassword`, `Password`, `passwordDB`) to `` or `` – Anatoly Mar 18 '16 at 12:23
  • Ah, that's a different thing. Then you'd need to use [`Regex.Replace(r.info, @"(?s)(.*?)", "$1");`](http://regexstorm.net/tester?p=(%3fs)%3cpassword%3e(.*%3f)%3c%2fpassword%3e&i=%3cpassword%3etest%3c%2fpassword%3e&r=%3cadminPassword%3e%241%3c%2fadminPassword%3e) (see *Context* tab below). – Wiktor Stribiżew Mar 18 '16 at 12:26
  • Why do you harcode last parameter `$1`? I don't want to change tag `asdfghj` to `$1` – Anatoly Mar 18 '16 at 12:29
  • Because I thought you had them fixed. If these are not fixed tags, and they can just *contain* `password`, and they can be in different case, use [`Regex.Replace(r.info, @"(?si)<([^\s<]*password[^\s<]*)>(.*?)\1>", "<$1>$2$1>");`](http://regexstorm.net/tester?p=(%3fsi)%3c(%5b%5e%5cs%3c%5d*password%5b%5e%5cs%3c%5d*)%3e(.*%3f)%3c%2f%5c1%3e&i=%3cpassword%3etest%3c%2fpassword%3e%0d%0a%3cadminPassword%3etest%3c%2fadminPassword%3e&r=%3c%241%3e%242%3c%2f%241%3e) – Wiktor Stribiżew Mar 18 '16 at 12:33
  • Please, look at my updated question. I added one example – Anatoly Mar 18 '16 at 12:35
  • Is it correct regex `var regex = new Regex(@"(?s).*?"); ` to find all info with tags like `adminPassword` or `password` etc ? – Anatoly Mar 18 '16 at 12:47
  • No, because `password` is hard-coded. Use the regex from the answer. `var regex = new Regex(@"(?si)<([^\s<]*password[^\s<]*)>.*?\1>");` – Wiktor Stribiżew Mar 18 '16 at 12:48
  • The `<([^\s<]*password[^\s<]*)>` pattern cannot match just any tag with `p`. It can only match a tag that only has a name that contains a `password` character sequence. – Wiktor Stribiżew Mar 18 '16 at 13:16