-1

Suppose i have a txt. file that looks like this:

    0 day0 event_data0
    1 day1 event_data1
    2 day2 event_data2
    3 day3 event_data3
    4 day4 event_data4
    ........
    n dayn event_datan

    #where: 
    #n is the event index
    #dayn is the day when the event happened. year-month-day format
    #event_datan is what happened at the event.

From this file, i need to create a new one with all the events that happened between two specific dates. like after september the 7th 2003 and before christmas 2006. Could someone help me this problem? Much appreciated!

4 Answers4

0

Looks like the datetime module is what you'll want. Iterate through the file line by line until the timedelta between the current line's date and your beginning threshold date (Sept 7, 2003 in your example) is positive; stop iterating when you breach Christmas 2006. Load the lines into either a pandas dataframe or numpy array.

Judu Le
  • 81
  • 7
0

Lucas, you can try this:

import re
import os
from datetime import datetime as dt


__date_start__ = dt.strptime('2003-09-07', "%Y-%m-%d").date()
__date_end__ = dt.strptime('2006-12-25', "%Y-%m-%d").date()

f = open('file.txt', 'r').read()
os.remove('events.txt')

for i in f:
    date = re.search('\d{4}\-\d{2}-\d{2}',i).group(0)
    if date != '':
        date_converted = dt.strptime(date, '%Y-%m-%d').date()
        if (date_converted > __date_start__) and (date_converted < __date_end__):
            open('events.txt', 'a').write(i)

You will change __date_start__ and __date_end__ values to your desire interval, then, the code will search in lines a regex that match with the format of date yyyy-mm-dd. So on, it going to compare in range (date start & end) and, if true, append a events.txt file the content of line.

Abe
  • 1,357
  • 13
  • 31
0

I assume your file is tab delimited so you can use the pandas package to read it. Just add a the first row with the column names (index, date, event) in your .txt file separated by tab and then read in the data.

df = pandas.read_csv('txt_file.txt', sep='\t', index_col=0)
#index_col=0 just sets your first column as index

After you've done so, follow the steps from this link. That will essentially answer your question on how to select events between two dates by simply using this package. That way you can return a new data frame only with those events you need.

Vlad Sirbu
  • 108
  • 10
0

You have not described that you want especially for "after September the 7th 2003 and before Christmas 2006." or you have other options for these two dates ?

if specially for "after september the 7th 2003 and before christmas 2006." then you can get result with regex module in my opinion :

import re
c=r"([0-9]{1,2}\s+)(2003-09-07).+(2006-12-25)\s+\w+"
with open("event.txt","r") as f:
    file_data=f.readlines()
    regex_search=re.search(c,str(file_data))
    print(regex_search.group())

You can also use conditions with group() , or you can use findall() method.