0

Searched myself silly, but couldn't find the answer.

Basically I want to import a number of GPS files with the aim to know the location of each GPS at any given time.

I wanted to use Panda's datetime index for this. What I can't seem to figure out is how to align this data.

My result is that each gps starts a new timedate index, I think I'm overwriting my timedata with every import.

I've tried creating a df outside the for loop first, but not with great results.

csv1

csv2

This is my code:

import pandas as pd
import glob
import os
from datetime import datetime
from pandas import ExcelWriter

pattern = '*.csv'
csv_files = glob.glob(pattern)
frames = []


for csv in csv_files:
    with open(csv) as fp:
        skip = next(filter(
            lambda x: x[1].startswith('trkpt'),
            enumerate(fp)
        ))[0] + 1
    df = pd.read_csv(csv, usecols = ['lat','lon','ele','time'], parse_dates=['time'], skiprows=skip)
    df['DateTime'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
    df = df.set_index('DateTime')
    df.rename(columns={'lat':'lat' + ' ' + csv,'lon':'lon' + ' ' + csv,'ele':'ele' + ' ' + csv}, inplace=True)
    df.drop(['time'], axis=1, inplace=True)
    frames.append(df)

df = pd.concat(frames)

df.to_csv('GPS Export.csv', sep=',')

File example

trkpt                   

ID  trksegID    lat lon ele time
1   1   -32.46226206    116.0619373 311.6   2021-01-22T01:54:03Z
2   1   -32.46225444    116.0619245 311.6   2021-01-22T01:54:04Z
3   1   -32.46225762    116.0619227 314.97  2021-01-22T01:54:05Z
4   1   -32.46226215    116.0619119 316.41  2021-01-22T01:54:06Z
5   1   -32.46226123    116.0618896 317.85  2021-01-22T01:54:07Z
6   1   -32.46225611    116.0618791 317.85  2021-01-22T01:54:08Z
7   1   -32.46224949    116.0618693 316.41  2021-01-22T01:54:09Z
8   1   -32.46224086    116.0618602 314.97  2021-01-22T01:54:10Z
9   1   -32.46223943    116.0618525 314.49  2021-01-22T01:54:11Z
10  1   -32.46225385    116.0618722 314.49  2021-01-22T01:54:12Z

also got a small problem with the date formatting, but I can live with that

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Zoner
  • 37
  • 9

2 Answers2

1

The solution would be to set the datetime index after concatenating the files. The snippet below assumes that all csv's are formatted similarly to your csv snippet and skips the first row with trkpt. It also adds a column with the csv filename since it appears you wish to do some renaming or postprocessing with the filename.

import glob
import pandas as pd
import os

df = pd.concat([pd.read_csv(fp, skiprows=1).assign(filename=os.path.basename(fp)) for fp in glob.glob('*.csv')])
df['DateTime'] = pd.to_datetime(df['time'], format='%Y-%m-%d %H:%M:%S')
df = df.set_index('DateTime')
df.drop(['time'], axis=1, inplace=True)

Output:

| DateTime                  |   ID |   trksegID |      lat |     lon |    ele | filename   |
|:--------------------------|-----:|-----------:|---------:|--------:|-------:|:-----------|
| 2021-01-22 01:54:03+00:00 |    1 |          1 | -32.4623 | 116.062 | 311.6  | 2.csv      |
| 2021-01-22 01:54:04+00:00 |    2 |          1 | -32.4623 | 116.062 | 311.6  | 2.csv      |
| 2021-01-22 01:54:05+00:00 |    3 |          1 | -32.4623 | 116.062 | 314.97 | 2.csv      |
| 2021-01-22 01:54:06+00:00 |    4 |          1 | -32.4623 | 116.062 | 316.41 | 2.csv      |
| 2021-01-22 01:54:07+00:00 |    5 |          1 | -32.4623 | 116.062 | 317.85 | 2.csv      |
| 2021-01-22 01:54:08+00:00 |    6 |          1 | -32.4623 | 116.062 | 317.85 | 2.csv      |
| 2021-01-22 01:54:09+00:00 |    7 |          1 | -32.4622 | 116.062 | 316.41 | 1.csv      |
| 2021-01-22 01:54:10+00:00 |    8 |          1 | -32.4622 | 116.062 | 314.97 | 1.csv      |
| 2021-01-22 01:54:11+00:00 |    9 |          1 | -32.4622 | 116.062 | 314.49 | 1.csv      |
| 2021-01-22 01:54:12+00:00 |   10 |          1 | -32.4623 | 116.062 | 314.49 | 1.csv      |
RJ Adriaansen
  • 9,131
  • 2
  • 12
  • 26
  • Thanks RJ, this works but in a slightly different way than I imagined. I thought the additional/new csv would append their results to the same timestamp (hence my column selection and renaming to keep the df small). so if I would call ```df['2021-01-22 02:54:04']``` I would get one line with all my data. Similar to https://sergilehkyi.com/tips-on-working-with-datetime-index-in-pandas/ . Do you know if there is a way to this? – Zoner Feb 05 '21 at 01:58
  • regarding " skips the first row with trkpt"....the csvs have big (uneven) headers, hence the search function. – Zoner Feb 05 '21 at 02:00
  • I see that you were able to make your code work, great! In my solution you could [filter](https://stackoverflow.com/questions/22898824/filtering-pandas-dataframes-on-dates) the df by a specific DateTime to retrieve all rows with that timestamp. Or you could do a [groupby](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) on the DateTime. There are multiple ways to process it while keeping the data structure intact. – RJ Adriaansen Feb 05 '21 at 18:12
  • Indeed! Its a brilliant way to correlate csv's from different datasources. I was a bit surprised I couldnt find/google a solution since Ive used this setup quite a bit durout my career. I must have been googling it wrong. – Zoner Feb 05 '21 at 23:45
0

Feel a bit silly, the solution was changing the concat to: df = pd.concat(frames, axis=1)

This exports one timestamp for all csv files in line with columns. I have used RJ's solution to shine up the code, such an elegant import!

Zoner
  • 37
  • 9