1

What I'm trying to accomplish is to auto-generate tags/keywords for a file upload, basing these keywords from the filename.

I have accomplished auto-generating titles for each upload, as shown here: enter image description here

But I have now moved on to trying to auto-generate keywords. Similar to titles, but with more formatting. First, I run the string through this to remove commonly used words from the filename (such as this,that,there... etc)

I am happy with it, but I need to not include words that have numbers in it. I have not found a solution on how to remove a word entirely if it contains a number. The solutions I have found like here only works for a certain match, while this one removes numbers alone. I would like to remove the entire word if it contains ANY numeric digit.

Community
  • 1
  • 1
Mafia
  • 792
  • 2
  • 19
  • 39

5 Answers5

1

Apply a simple regular expression to you current filename strings, replacing all occurrences with the empty string. The regular expression matches "words" containing any digits.

Javascript example:

'asdf 8bit jawesome234 mayhem 234'.replace(/\s*\b\w*\d\w*\b/g, '')

Evaluates to:

"asdf mayhem"

Here the regular expression is /\s*\b\w*\d\w*\b/g, which matches maximal sequences consisting of zero or more whitespace characters (\s*) followed by a word-boundary transition (\b), followed by zero or more alphanum characters (\w*), followed by a digit (\d), followed by zero or more alphanum characters, followed by a word-boundary transition (\b). \b matches the empty string at the transition to an alphanumeric character from either the beginning or end of the word or a non-alphanumeric character. The g after the final / of the regular expression means replace all occurrences, not just the first.

Once the digit-words are removed, you can split the string into keywords however you want (by whitespace, for example).

"asdf mayhem".split(/\s+/);

Evaluates to:

["asdf", "mayhem"]
Maxy-B
  • 2,772
  • 1
  • 25
  • 33
  • Hello @Maxy-B, but it is unable to remove the entire word, this only removes the number and I was hoping to remove entire words. (not just the number) Example: `2morrow tomorrow`, then I would just wish to return `tomorrow` and remove `2morrow` entirely from the string because it contains a number (so the keywords won't look broken, otherwise I would have a `morrow` keyword.) :) – Mafia Jun 30 '12 at 16:27
  • 1
    Edited to remove a word if it contains one or more digits (whether the whole word is digits or not). – Maxy-B Jun 30 '12 at 16:38
1

To remove all words which contain a number, use:

string = string.replace(/[a-z]*\d+[a-z]*/gi, '');
Rob W
  • 341,306
  • 83
  • 791
  • 678
  • Thanks Rob, this was a very good answer. It does exactly the job I needed done however when a word containing a number starts in capital case (ex. `Amazon1`) it leaves the capital letter there (returning `A`). However, I *strtolowered* the string before running it with this and now it's perfect! :) bravo ... **Edit** I see you fixed the issue with the caps, so now it's absolutely perfect without my help, thanks so much! Just what I needed – Mafia Jun 30 '12 at 16:41
  • @Love The `i` flag makes the regular expression case-insensitive. See my original answer for an explanation on the regular expression pattern. Here's MDN's guide for Regular expressions: https://developer.mozilla.org/en/JavaScript/Guide/Regular_Expressions – Rob W Jun 30 '12 at 16:42
  • @Engineer Thanks for your time, I was trying out the first solutions provided and 1 worked before I got to your answer. :) – Mafia Jun 30 '12 at 16:46
  • @RobW I believe, that the OP splits the string by '`-_`', only then applies replacement. See the samples in question's pic. – Engineer Jun 30 '12 at 16:51
  • No, there is no pattern used in my filenames, they are images submitted by users. Those are just plain filenames :) nothing done to filenames that should help affect anything – Mafia Jun 30 '12 at 16:56
  • @Love Engineer means that the hyphens are replaced with spaces. Your file name can be turned in tags by using `string.split(/[^a-b]/i)`. – Rob W Jun 30 '12 at 16:59
  • @Love Probably you need to define, what do you mean by '`word`' noun. What are words? Is '`word1-word2`' represents 2 words, or 1 word? Similar situation with '`word1_word2`' – Engineer Jun 30 '12 at 17:00
  • My `words` come from the string in the title field. When I say string, it's those titles. I base the word tags from the title. – Mafia Jun 30 '12 at 17:01
1

Try this expression:

 var regex = /\b[^\s]*\d[^\s]*\b/g;

Example:

 var str = "normal 5digit dig555it digit5 555";
 console.log( str.replace(regex,'') );​   //Result->  normal    
Engineer
  • 47,849
  • 12
  • 88
  • 91
  • 1. The word boundaries are unnecessary. 2. In the screenshot (see question), the words seems to be delimited by hyphens. The result for `"normal-5"' is `""`, using your code. – Rob W Jun 30 '12 at 16:38
  • @RobW Probably they are unnecessary, but I have written regexp for general case. – Engineer Jun 30 '12 at 16:40
  • @RobW and just to point it out, the OP is already spliting the string to make the title.... could just run this regex against the title after the split;) – Trey Jun 30 '12 at 16:55
1
('Apple Cover Photo 23s423 of your 543634 moms').match(/\b([^\d]+)\b/g, '')

returns

Apple Cover Photo , of your , moms

http://jsfiddle.net/awBPX/2/

Trey
  • 5,480
  • 4
  • 23
  • 30
0

use this to Remove words containing numeric :

string.replace("[0-9]","");

hope this helps.

Edited :

check this :

var str = 'one 2two three3 fo4ur 5 six';
var result = str.match(/(^[\D]+\s|\s[\D]+\s|\s[\D]+$|^[\D]+$)+/g).join('');
Behnam Esmaili
  • 5,835
  • 6
  • 32
  • 63