I have a very large text file containing mostly numerical data with headers (strings). It has 13 columns and ~49,000 rows. All cells contain a numerical value (no empty cells, no columns with different # of rows). It is data about the solar wind taken from a satellite. It looks like this:
year,day,hr,min, sec, Np, Tp, Vx_gsm, Vy_gsm, Vz_gsm, Bx_gsm.....
YYYYxDDDxHHxMMxSSSSSSSxNNNNNNNxTTTTTTTTTTTTxVVVVVVVVVVxVVVVVVVVxVVVVVVVVVxB
2000 1 0 0 43.0272 3.27400 289450.00000 -674.22809 49.72259 -67.01060
2000 1 0 1 47.0496 3.42400 -9999.90039 -9999.90039 .....
Now, the columns are actually perfectly aligned in the text file but I couldn't include all 13 columns here but that doesn't matter for now. Wherever there is a "-9999.90039", that means the value was corrupted for some reason. What I need to do is replace the "-9999.90039" values with NaN, so they won't be included in any calculations. I need to create arrays of the columns of cell data for Np and Tp (for now), initialize them to zero, and find the minimum and maximum values in each. This is a large file so I figured I needed to do this in blocks. Also, since I only need to access the values in the numerical cells of 1 column (per calculation) then I didn't need to change much and use textscan. This is what I've gotten so far:
N = 10;% block size will have to be bigger but I wanted to test first
solarmax = fopen('ACE_magswe_64sec_2000.txt','r');
formatSpec = '%*d %*d %*d %*d %f %f %f %f %f %f %f %f %f';
% my prof said I need to initialize my variables by setting them to zero so
% I put this line below but I don't think it's right
k = 0;% not sure if this is necessary
while ~feof(solarmax)
k = k+1;% not sure if this is necessary
C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
function y = changeval(num)
if (num('-9999.90039',num))
y = 'NaN';
end
end
fclose(solarmax);
Np = C{1,6};% not sure what to put here to call all values in that column
min(Np)
max(Np)
Tp = C{1,7};% same problem here
min(Tp)
max(Tp)
So, I put asterisks next to those columns I want ignored in FormatSpec and I used HeaderLines to ignore the first 2 rows. After, this I get confused on how I'm supposed to set this up (my only previous experience in programming was C++ in 2006!) Please help!!
UPDATED
Marcin offered me some great advice but I still have some issues that I need help with. Here is my code, now:
N = 1000;
solarmax = fopen('ACE_magswe_64sec_2000.txt','r+');
formatSpec = '%*d %*d %*d %*d %*f %f %f %f %f %f %f %f %f';
minNp = [];
maxNp = [];
minTp = [];
maxTp = [];
while ~feof(solarmax)
C = textscan(solarmax,formatSpec,N,'HeaderLines',2,'Delimiter','\t');
Np = cell2mat(C(:,1));
Tp = cell2mat(C(:,2));
Np(Np == -9999.90039) = NaN;
Tp(Tp == -9999.90039) = NaN;
minNp(end+1) = nanmin(Np);
maxNp(end+1) = nanmax(Np);
minTp(end+1) = nanmin(Tp);
maxTp(end+1) = nanmax(Tp);
end
fclose(solarmax);
nanmin(Np);
nanmax(Np);
nanmin(Tp);
nanmax(Tp);
When compiled and deleting the semicolons at the very end so I get values for the min and max functions, the values turn out to all be NaNs! I thought that the min and max commands already ignored the NaNs so I looked it up and nanmin/nanmax was suggested. However, this yielded the same results. Is there anything else I'm missing?