I posted a question earlier ( Pandas-ipython, how to create new data frames with drill down capabilities ) and it was pointed out that it is possibly too broad so I have some more specific questions that may be easier to respond to and help me get a start with graphing data.
I have decided to try creating some visualizations of my data using Pandas (or any package accessible through ipython). The first, obvious, problem I run into is how can I filter on certain conditions. For example I type the command:
df.Duration.hist(bins=10)
but get an error due to unrecognized dtypes (there are some entries that aren't in datetime format). How can I exclude these in the original command?
Also, what if I want to create the same histogram but filtering to keep only records that have id's (in an account id field) starting with the integer (or string?) '2'?
Ultimately, I want to be able to create histograms, line plots, box plots and so on but filtering for certain months, user id's, or just bad 'dtypes'.
Can anyone help me modify the above command to add filters to it. (I'm decent with python-new to data)
tnx
update: a kind user below has been trying to help me with this problem. I have a few developments to add to the question and a more specific problem.
I have columns in my data frame for Start Time and End Time and created a 'Duration' column for time lapsed.
The Start Time/End Time columns have fields that look like:
2014/03/30 15:45
and when I apply the command pd.to_datetime() to these columns I get fields resulting that look like:
2014-03-30 15:45:00
I changed the format to datetime and created a new column which is the 'Duration' or time lapsed in one command:
df['Duration'] = pd.to_datetime(df['End Time'])-pd.to_datetime(df['Start Time'])
The format of the fields in the duration column is:
01:14:00
or hh:mm:ss
to indicate time lapsed or 74 mins in the above example.
the dtype of the duration column fields (hh:mm:ss) is:
dtype('<m8[ns]')
The question is, how can I convert these fields to just integers?