-2

Lets say that i am creating a small chrome extension(hence most of my code is in JS). And that given a list of strings e.g:

Artist - Song Name

Artist, Song Name

Song Name - Artist

Irrelevant info - Song Name - Artist

etc.

I only need to extract the Song Name from the string, however, i can't anticipate all the forms the string could appear.

So my question is what is the best way to extract this info? Is it machine learning? If so, can the code be written in JS or should an API be used? Or maybe there is a solution other than Machine Learning?

P.S

I know that this question doesn't really follow the guidelines of the questions that can be asked in this site, and i know that it is kind of open ended and ambiguous, but i couldn't think of anywhere else where to ask this, so

Thank you in advance.

Markwin
  • 177
  • 2
  • 16
  • This has nothing to do with machine learning. Why do you have those strings, where do they come from? – juvian May 29 '17 at 17:06

2 Answers2

0

There is a great deal of statistics involved in machine learning. So, to put it in very basic nutshell: what a "machine" has to learn, is the propability whether a word or a group of words tends to be a song name or an artist.

That's where the learning part starts: someone or some other machine must "teach" the "machine" in the beginning as a starting point.

However: even a human being wouldn't know if "Hurricane" was song or e.g. a band name. There's contextual information needed in order to find the correct classification.

Maybe, using an open API, which already provides this information would be a better approach. You might perhaps want have a look at this question:

Is there a free database or web service api for music information (albums, artists, tracks)?

LongHike
  • 4,016
  • 4
  • 37
  • 76
0

Sketch of something which could work:

  • make regexp for all possible dividing chars (commas, hyphens etc.)
  • divide your strings to pieces using this regexp
  • for one-piece sting assume that this is a title
  • for two-piece-strings assume that the longer piece is a title, and shorter is an artist
  • make list of artists and titles (or your global database with them for better results)
  • search if some of your titles isn't the same as some artist - that could be possible mistake
  • for three-or-more piece strings identify artists basing on your list
  • for rest of pieces assume that part with lower index (being closer to beginning of string) is a title
  • eventually you can search through Google API if your pieces labeled as titles return more results than other pieces

Of course this won't work ideally, but I assume that you don't expect it.

Karol Selak
  • 4,248
  • 6
  • 35
  • 65