I am learning Pandas and I came to an interesting question. So I have a Dataframe like this:
COL1 COL2 COL3
a 9/8/2016 2
b 12/4/2016 23
...
n 1/1/2015 21
COL1 is a String, Col2 is a timestamp and Col3 is a number. Now I need to do some analysis on this Dataframe and I want to convert all the non-numeric data to numeric. I tried using DictVectorizer() to convert COL1 and 2 to numeric but first of all I am not sure if this is the best way doing such a thing and second I don't know what to do with the timestamp. When I use DictVectorizer the output would be like:
{u'COL3: {0:2, 1:23 , ...,n:21}, 'COL1': {0: u'a', 1:'b', ... , n:'n'}, 'COL2': {0: u'9/8/2016' , 1: u'12/4/2016' , ... , n:u'1/1/2016'}}
but from what I learned it should be like this or at least I know I need something like this:
{COL1:'a', COL2: '9/8/2016' , COL3: 2 and so on}
so, questions: 1-what is the best way of converting non- numeric (including date) to numeric values to use in sklearn libraries 2- what is the right way of using DictVectorize()
Any help would be appreciated.