0

I have two CSV files with timestamp data in str format. the first CSV_1 has resampled data from a pandas timeseries, into 15 minute blocks and looks like:

time            ave_speed   
1/13/15 4:30    34.12318398 
1/13/15 4:45    0.83396195  
1/13/15 5:00    1.466816057

CSV_2 has regular times from gps points e.g.

id      time            lat         lng
513620  1/13/15 4:31    -8.15949    118.26005
513667  1/13/15 4:36    -8.15215    118.25847
513668  1/13/15 5:01    -8.15211    118.25847

I'm trying to iterate through both files to find instances where time in CSV_2 is found within the 15 min time group in CSV_1 and then do something. In this case append ave_speed to every entry which this condition is true.

Desired result using the above examples:

id      time            lat         lng           ave_speed
513620  1/13/15 4:31    -8.15949    118.26005     0.83396195
513667  1/13/15 4:36    -8.15215    118.25847     0.83396195
513668  1/13/15 5:01    -8.15211    118.25847     something else

I tried doing it solely in pandas dataframes but ran into some troubles I thought this might be a workaround to achieve what i'm after.

This is the code i've written so far and I feel like it's close but I can't seem to nail the logic to get my for loop returning entries within the 15 min time group.

with open('path/CSV_2.csv', mode="rU") as infile:
with open('path/CSV_1.csv', mode="rU") as newinfile:
    reader = csv.reader(infile)
    nreader = csv.reader(newinfile)
    next(nreader, None)  # skip the headers
    next(reader, None)  # skip the headers

    for row in nreader:
        for dfrow in reader:
            if (datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') < datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') and
            datetime.datetime.strptime(dfrow[2],'%Y-%m-%d %H:%M:%S') > datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S') - datetime.timedelta(minutes=15)):
                print dfrow[2]

Link to pandas question I posted with same problem Pandas, check if timestamp value exists in resampled 30 min time bin of datetimeindex

EDIT: Creating two lists of time, i.e. listOne with all the times from CSV_1 and listTwo with all the times in CSV_2 I'm able to find instances in the time groups. So something is weird with using CSV values. Any help would be appreciated.

Community
  • 1
  • 1
hselbie
  • 1,749
  • 9
  • 24
  • 40

1 Answers1

0

I feel like this is pretty close to what I want if anyone is curious on how to do the same thing. It's not massively efficient and the current script takes roughly 1 day to iterate over all the rows multiple times because of the double loop.

If anyone has any thoughts on how to make this easier or quicker i'd be very interested.

#OPEN THE CSV FILES
with open('/GPS_Timepoints.csv', mode="rU") as infile:
with open('/Resampled.csv', mode="rU") as newinfile:
    reader = csv.reader(infile)
    nreader = csv.reader(newinfile)
    next(nreader, None)  # skip the headers
    next(reader, None)  # skip the headers

    #DICT COMPREHENSION TO GET ONLY THE DESIRED DATA FROM CSV              
    checkDates = {row[0] : row[7] for row in nreader }
    x = checkDates.items()

    # READ CSV INTO LIST (SEEMED TO BE EASIER THAN READING DIRECT FROM CSV FILE, I DON'T KNOW IF IT'S FASTER)
    csvDates = []
    for row in reader:
        csvDates.append(row)

    #LOOP 1 TO ITERATE OVER FULL RANGE OF DATES IN RESAMPLED DATA AND A PRINT STATEMENT TO GIVE ME HOPE THE PROGRAM IS RUNNING
    for i in range(0,len(x)):
        print 'checking', i
        #TEST TO SEE IF THE TIME IS IN THE TIME RANGE, THEN IF TRUE INSERT THE DESIRED ATTRIBUTE, IN THIS CASE SPEED TO THE ROW 
        for row in csvDates:
            if row[2] > x[i-1][0] and row[2] < x[i][0]:
                row.insert(9,x[i][1])

    # GET THE RESULT TO CSV TO UPLOAD INTO GIS
    with open('/result.csv', mode="w") as outfile:

        wr = csv.writer(outfile)
        wr.writerow(['id','boat_id','time','state','lat','lng','activity','speed', 'state_reason'])

        for row in csvDates:
            wr.writerow(row)
hselbie
  • 1,749
  • 9
  • 24
  • 40