3

I have a large data base that looks like this:

id, Start Time, End Time
0, 2017-01-01 00:00:21, 2017-01-01 00:11:41
1, 2017-01-01 00:00:45, 2017-01-01 00:11:46
2, 2017-02-01 00:00:57, 2017-02-01 00:22:08
3, 2017-03-01 00:01:10, 2017-03-01 00:11:42
4, 2017-01-01 00:01:51, 2017-01-01 00:12:57

Using pandas would probably be easier to do this, but I don't have much experience with it. I have researched modules such as arrow and datetime and would like to filter the data based on the user's input. Using that input, the user is returned filtered data. For example:

def get_month('data.csv'):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    date = '1 ' + month + ', 2017'
    with open(city_data, 'r') as fin, open('userdata.csv', 'w') as fout:
         writer = csv.writer(fout, delimiter=' ')
         for row in csv.reader(fin, delimiter=' '):
             if row[0] == arrow.get(date,'D MMMM, YYYY').format('YYYY-MM-DD'):
                 return writer.writerow(row)

Am I approaching this correctly? I think I may be going in the wrong direction in the date = '1 ' + month + ', 2017' part. Is there a way I could filter the data with just an input like January?

jpp
  • 159,742
  • 34
  • 281
  • 339

1 Answers1

3

For structured data, pandas provides an efficient solution:

from datetime import datetime
import pandas as pd

# read data from file
df = pd.read_csv('data.csv')

# this creates a dataframe as below:
#    id           Start Time             End Time
# 0   0  2017-01-01 00:00:21  2017-01-01 00:11:41
# 1   1  2017-01-01 00:00:45  2017-01-01 00:11:46
# 2   2  2017-02-01 00:00:57  2017-02-01 00:22:08
# 3   3  2017-03-01 00:01:10  2017-03-01 00:11:42
# 4   4  2017-01-01 00:01:51  2017-01-01 00:12:57

# cast string columns to datetime
df['Start Time'] = pd.to_datetime(df['Start Time'])
df['End Time'] = pd.to_datetime(df['End Time'])

def get_month(df):
    month = input('\nWhich month? January, February, March, April, May, or June?\n')
    return df[df['Start Time'].dt.month == datetime.strptime(month, '%B').month]

get_month(df)
jpp
  • 159,742
  • 34
  • 281
  • 339
  • 1
    Thank you for your answer! Looks like I'm going with pandas. It looks so elegant and easy to understand. I was able to look up a few things in the docs based on your answer and I pretty much understand it now. Thanks again! – louielouielouie Feb 18 '18 at 17:19
  • 1
    if I wanted to do the same thing above but filter by day would the code look like this? `return city_data[city_data['Start Time'].dt.day == datetime.strptime(day, '%A').day]` @jp – louielouielouie Feb 19 '18 at 00:00
  • 1
    If the input is an integer, you could just do `return city_data[city_data['Start Time'].dt.day == day_input`, where `day_input` is the integer input by the user. But remember to use `day_input = int(input('What day?'))`, as by default inputs are strings [so convert to int]. – jpp Feb 19 '18 at 00:04
  • Awesome, that worked. Thank you. Out of curiosity, would the method above work for a response like `Monday` ? – louielouielouie Feb 19 '18 at 00:07
  • 1
    The answer to your question is already "out there", see [here](https://stackoverflow.com/a/29519293/9209546). In general, I'm happy to help, but please try and do a simple search first.. it makes sure we respond when you really get stuck! – jpp Feb 19 '18 at 00:10