0

I guess it's not so difficult but I'm really stuck. I have some files, named with a string of 9 numbers, like:

003485295.lst
005847283.lst
092348235.lst
...

Each file identifies a measurement station, and contains a table with two columns, date and value, separated by a space character. For example, 003485295.lst is:

date value //header row here!
2014-01-03-07:00-00 2.2
2014-01-04-07:00-00 3.1
2014-01-05-07:00-00 28.6
2014-01-06-07:00-00 2.5
2014-01-14-07:00-00 5.6
...

Some dates are common among all files (stations), but some are not.

What I'm looking for

I have chosen one particular date, say 2014-01-06. I want to:

  • create an empty result file, say 2014-01-06.txt;
  • cycle through all of my *********.lst files;
  • search if station ******** took some measurement MM.M that day;
  • add a row in my result file, following the pattern stationId-value, like ******** MM.M (could also be M.M, or MMM.M).

So the desired output, 2014-01-06.txt, is something like:

003485295 2.5 //as we read in 003485295.lst
001022903 6.4
001022905 6.6
001022907 10.3
001026202 30.6
...

Stations that had no value at all for that day should be excluded. I'm on Windows and have R and MATLAB in my hands, but could work with other tools as well.

Community
  • 1
  • 1
natario
  • 24,954
  • 17
  • 88
  • 158

2 Answers2

0

Approach using R

I would go about this the following way:

  1. Upload all files to a single Data Table in R using the read.csv function (your files are basically space-delimited CSVs so this should be relatively easy)

  2. Sort the values by dates

  3. Replace the dates with Factors

  4. Split the data.table by the Date Factors like this.

  5. Do a for through the data.tables and dump each separately as a CSV using the write.csv function

Approach using Excel

  1. Upload all files via VBA using the DIR function (loop through files) and reading each file - link here. Use the Split function on spaces to tokenize the strings. Dump all the data into a single Worksheet

  2. Manually sort the data by the date column

  3. Write a short VBA macro to dump the data to separate files using this and a piece of code like this:

    Dim fileDate as String Do Until currDate.Value = "" if fileDate <> currDate then 'Create new file date fileDate = currDate end if 'Dump data to the file Write #fileNo, Cells(currDate.Row, 2) loop

In this whole exercise remember basically that the files are space delimited CSVs this will help in getting the right approach to reading and writing the files.

Community
  • 1
  • 1
AnalystCave.com
  • 4,884
  • 2
  • 22
  • 30
  • Approached a bit differently, but I made it. I just want to point out that `write.csv` is not OK since it won't let you use `append=TRUE`. Used `write.table` instead. – natario May 13 '15 at 11:57
0

Here's a Matlab function which will accomplish this. (It assumes you have at least Matlab R2014b, since it uses Matlab datetimes. If not, you can use datenum instead).

function result = getDateData(dateToLookup)
    % Get a list of all *.lst files
    d = dir('*.lst');

    result = [];
    for i=1:length(d)
        % Read in the data file using readtable.  Delimiter is space, but need
        % to skip the first line because it doesn't follow this pattern
        t = readtable(d(i).name, 'FileType', 'text', 'Delimiter', ' ', ...
            'HeaderLines', 1, 'ReadVariableNames', false);
        t.Properties.VariableNames = {'Date', 'Data'};

        % Convert the first column to datetime
        t.Date = datetime(t.Date, 'Format', 'yyyy-MM-dd-HH:mm-ss');

        % Looks like you want to lookup by date (and not consider
        % hours/minutes/seconds, so shift these to start of day)
        t.Date = dateshift(t.Date, 'start', 'day');

        % Add in the filename to the table (minus the .lst)
        t.FileName = repmat({strrep(d(i).name, '.lst', '')}, height(t), 1);

        % Append table to the result
        result = [result; t]; %#ok<AGROW>
    end

    % This is the date you're asking for
    dt = datetime(dateToLookup);

    % Find all of the matches
    matches = result(result.Date == dt, :);

    % Create a new table as the result and write it out
    result = table(matches.FileName, matches.Data, ...
        'VariableNames', {'FileName', 'Data'});
    writetable(result, 'data.txt', 'Delimiter', ' ');
end
Rob
  • 66
  • 1