0

I am trying to implement a machine learning algorithm that will help me with two goals:

1) Classify a given string in a set into a predetermined category based on their content.
2) Estimate the confidence that a given string belongs in the category

An example set of strings and their categories is below:

"Damage to right rear fender" -- Problem

"Scratch. Side view mirror" -- Problem

"Next scheduled maintenance on 12/23/2016" -- Appointment

"Customer should return on 1/1/2017" -- Appointment

"Red car, Volkswagon" -- Description

"Car is dark gray with large scratch on the side" -- Description

" Do not fill the car with premium fuel" -- Instruction

"Engine should cool to <100 celcius before driving" -- Instruction

I am brand new to machine learning and so am trying to figure out the best approach to accomplish my goal in python. I have a training set of approximately 1000 strings and a test set of 5000 strings.

My first approach was to try a One vs. Rest classifier using Scikit (Credit to @Cerin and @JMaurer), but on implementation the results were not great (only 55% of my results were categorized correctly on manual review). I suspect because these strings contain symbols and numbers that contribute to their overall categorization.

Can anybody else with a bit more experience comment on if this is the right approach for the task or if there is a better method that I could utilize? I am a bit in the dark and am really looking for some breadcrumbs to point me in the right direction.

Thanks.

Paul

Community
  • 1
  • 1
PaulGlass
  • 43
  • 1
  • 4
  • Can you paste the code that supports your question? I am hardly a ML dev but I assume it would be useful to know information about your training set and test set. –  Nov 07 '16 at 20:27
  • 2
    Sorry Paul, but this is too broad a question to be on-topic for Stack Overflow. Try asking this question regarding approach at [Cross Validated](http://stats.stackexchange.com/). If you have specific questions about the implementation in python/scikit-learn, feel free to bring those back to Stack Overflow. – juanpa.arrivillaga Nov 07 '16 at 20:30
  • Ah okay...did not even know about Cross Validated. I will do so. Thanks for the advice. @kiran.koduru I will come back with my code after posting. – PaulGlass Nov 07 '16 at 20:37

0 Answers0