0

I am trying to import GeoLife GPS trajectory dataset in my workspace. The folder that includes this data has 182 sub-folders for each tracked user and each sub-folder has user specific trajectory files in .plt format. The number of trajectory files is not fixed, it may be different for each user. Also, columns do not contain just one data type. The type of first 5 column is float and the type of last 2 ones is string (date and time). My objective is to store this data in an array with size 182 in which each array slot includes user specific trajectories. To do this, I used getAllFiles function given here. Then, I visited all files that are returned from this function and stored them as follows:

fileList = getAllFiles('C:\...\MATLAB\HmMDTW\Data');

i = 1;
k = 1;

trj = [];
trjAll = cell(182,1);
pid = '000';

while i < length(fileList)
    fileDir = cell2mat(fileList(i));
    index = strfind(fileDir, 'Trajectory') + 11;

    if any(index)
        fid = fopen(fileDir);
        t = textscan(fid, '%f %f %f %f %f %s %s', 'Delimiter', ',', 'HeaderLines', 6, 'CollectOutput', 1);
        cid = fileDir(45:47);

        if ~strcmp(cid, pid)
            trjAll{k} = trj;
            trj = [];
            k = k + 1
        end

        pid = cid;
        trj = [trj;t];
        fclose(fid);
    end

    i = i + 1;
end

Above, I just checked all files and if this file is a trajectory, I read the relevant data in that file and add it in trj (trajectory list for the current user). If user id (000, ..., 181) changes in the next file I added trj in trjAll (all user trajectory arrays) and initialized trj. Hence, trjAll contained all trj-s. However, it took long time (about 4-5 minutes). Are there any more efficient way to achieve what I want to do? I think that I may read files in getAllFiles function but I do not think so it would save significant amount of time. Thank you in advance.

Community
  • 1
  • 1
Dorukhan Arslan
  • 2,676
  • 2
  • 24
  • 42
  • 1
    I don't understand what you're using "index" for. Is it just to check whether the pathname includes the string `'Trajectory'`? If so, you might be better including a regexp in your getAllFiles. I would assume that this is faster than doing a strfind within the while loop. – craq Dec 28 '15 at 17:19
  • 1
    in general, you can use the `profiler` to find which lines of your program are taking the most time to execute – craq Dec 28 '15 at 17:20
  • 2
    From the dataset description: `This dataset contains 17,621 trajectories with a total distance of about 1.2 million kilometers and a total duration of 48,000+ hours. These trajectories were recorded by different GPS loggers and GPS-phones, and have a variety of sampling rates. 91 percent of the trajectories are logged in a dense representation, e.g. every 1~5 seconds or every 5~10 meters per point.` That's a lot of data. It's going to take time to import. – sco1 Dec 28 '15 at 17:23
  • @craq Yes, there is no need to use "index", regexp is a better alternative. Your assumption is reasonable. I also consider this but there is not much gain actually. I have to open 17,621 trajectory files one by one in a way and it seems this is inevitable. Thank you, both. – Dorukhan Arslan Dec 28 '15 at 21:02

0 Answers0