0

I have implemented a data Import in matlab to load very big *.DBF-files into my work space. Now I'm trying to validate, that the data I imported is the same as the original data. My idea is to just count the number of characters in the imported cell Array and compare it to the number, that Notepad ++ Counts when using view->summary.
To Import the files in matlab I used the following code:

fid = fopen(fullFileName,'r','n','UTF-8'); % used the UTF-8 Option, because otherwise matlab wouldn't recognise german characters like ä,ö,ü
formatSpec=repmat('%s ',1,numberOfColumns); % numberOfColumns is 62 in my chase
data = textscan(fid,formatSpec,'Delimiter','|');
fclose(fid);
data=horzcat(data{:});

Now to count the number of characters I used the following code:

numberOfCharacters=sum(sum(cellfun(@length,data)))+size(data,1)+size(data,1)*(size(data,2)-1);

Here the first summand is the number of characters in each cell. I had to add the second summand because Notepad Counts the line breaks as characters. The third summand is number of delimiters that Notepad also Counts.
Now the results will be
19.489.252 in Notepad and
19.485.889 in Matlab
As you can see, the difference is pretty small compared to the amount of characters used. Still I Need to know what could be the cause of this.
One Thing I already checked is the number of non-ASCII characters in Notepad++ using this answer. Non-ASCII characters are counted correctly.
Unfortunatly I can't provide the data for you to test. So for an answer I would be happy about any Suggestion what could cause the difference in character Counts. Another method of proving that the data that matlab imports is the same as the original data would be welcome, too.

Community
  • 1
  • 1
Max
  • 1,471
  • 15
  • 37
  • 2
    Npp counts windows linebreak (ie`\r\n`) as 2 characters. Some characters with diacritic (like `ü` or `ö`) are also counted as 2 characters. – Toto Mar 13 '17 at 12:53
  • @toto Is there a way to stop Npp from doing that? If not could you link me to something where I can read how Npp Counts characters? – Max Mar 13 '17 at 13:06
  • @toto I would also be interested in an Explanation of the difference between 'current document length' and 'Characters (without blanks)'. I would be inclined to think that the latter doesn't count spaces as characters, but it actually does. – Max Mar 13 '17 at 13:12
  • I don't know any way to change this beaviour. `ö` is, in fact, 2 character long, its Hex representation is `C3B6` and `ü` is `C3BC`. I have no answers for the second question. You may find some answer in the oficial forum of Npp https://notepad-plus-plus.org/community/ – Toto Mar 13 '17 at 13:34
  • @toto alright, thank you very much so far! :) I could already find out, that "characters (without blanks)" does not Count ü as 2 characters, so this will be my way to go. – Max Mar 13 '17 at 13:52

0 Answers0