I have experimented a little and devised a number of methods with which a file's contents might be read into a list or array of bytes. In each case, the filepath
should be a posix path to the file being read.
Any snippets using AppleScriptObjC will need appropriate headers inserted at the top of the script, and I have included them at the end, along with the extra block that will be used with JXA scripts.
1. read
the file and obtain the ASCII number
of each character
The file is read "as is", and each character of the string is converted into an ascii code value:
to readBytes from filepath as text
local filepath
script bytes
property list : characters of (read the filepath)
end script
repeat with char in (a reference to the list of bytes)
set char's contents to ASCII number char
end repeat
return the list of bytes
end readBytes
Here's a similar implementation using AppleScriptObjC:
to readBytes from filepath as text
local filepath
set bytes to NSMutableArray's new()
set hexdump to (NSString's stringWithContentsOfFile:((NSString's ¬
stringWithString:filepath)'s stringByStandardizingPath()) ¬
encoding:NSASCIIStringEncoding |error|:nil)
repeat with i from 0 to (hexdump's |length|()) - 1
(bytes's addObject:(hexdump's characterAtIndex:i))
end repeat
return the bytes as list
end readBytes
2. read
the file into a list a short (2-byte) integers and then extract the high- and low-byte values from each
This is the fastest method, and again uses the standard additions read
command, this type mapping the contents directly into a list of short integers. If the number of bytes is odd, then the first byte is is read singly, whilst the remaining are 2-byte pairs that are extracted into 1-byte values and returned as a list:
to readBytes from filepath as text
local filepath
script bytes
property length : get eof of filepath
property index : length mod 2 + 1
property shortInts : read filepath as short ¬
from index for length - index - 1
property list : {}
end script
if bytes's index = 2 then set the end of the list of bytes ¬
to ASCII number of (read filepath for 1)
repeat with shortInt in bytes's shortInts
set abs to (shortInt + 65536) mod 65536
set the end of the list of bytes to abs div 256
set the end of the list of bytes to abs mod 256
end repeat
return the list of bytes
end readBytes
3. read
the file into a data
class object and convert the hexadecimal byte values to their decimal representation
The use of read
here pulls a raw data
encapsulated object that, strictly speaking, we can't do a lot with as it isn't a type class that coerces to any other. However, the additional handler __string__()
is a quick and dirty method of getting the hexadecimal byte values, which are then converted to decimal form and returned:
to __string__(object)
if the object's class = text then return the object
set tids to my text item delimiters
try
set s to {_:object} as null
on error e
set my text item delimiters to "Can’t make {_:"
set s to text items 2 thru -1 of e as text
set my text item delimiters to "} into type null."
set s to text items 1 thru -2 of s as text
set my text item delimiters to tids
end try
s
end __string__
to readBytes from filepath as text
local filepath
script bytes
property data : read filepath as data
property list : {}
end script
script hexdump
property chars : "0123456789ABCDEF"
property string : text 11 thru -2 of __string__(bytes's data)
property hibyte : a reference to text 2 of my string
property lobyte : a reference to text 1 of my string
to decimal()
set i to (offset of hibyte in chars) - 1
set j to (offset of lobyte in chars) - 1
i + j * 16
end decimal
end script
repeat ((hexdump's string's length) / 2 - 1) times
set the end of the list of bytes to hexdump's decimal()
set hexdump's string to hexdump's string's text 3 thru -1
end repeat
return the list of bytes
end readBytes
4. Use AppleScriptObjC to transform an ascii string into unicode hex values then convert to decimal using NSScanner
I included it as an alternative way to convert hexadecimal byte strings to integer decimal values using NSScanner
, but it's actually slow than my vanilla AppleScript handler decimal()
, so this method is more for general interest:
to readBytes from filepath as text
local filepath
set hexdump to ((NSString's stringWithContentsOfFile:((NSString's ¬
stringWithString:filepath)'s stringByStandardizingPath()) ¬
encoding:NSASCIIStringEncoding |error|:nil)'s ¬
stringByApplyingTransform:"Any-Hex" |reverse|:no)'s ¬
componentsSeparatedByString:"\\u00"
hexdump's removeFirstObject()
set hexbytes to hexdump's objectEnumerator()
script bytes
property list : {}
end script
repeat
set hexbyte to the nextObject() of the hexbytes
if hexbyte = missing value then exit repeat
set scanner to NSScanner's scannerWithString:hexbyte
set [bool, s] to scanner's scanHexInt:_1
set the end of the list of the bytes to s as integer
end repeat
return the list of bytes
end readBytes
5. Use JSObjC (JXA-ObjectiveC) to read the raw data then...
Retrieve an array of C-pointers to the bytes values directly
One of the nice things about JXA is the access it has to other data types outwith AppleScriptObjC, which means we can manipulate C data types and access array buffers:
function readBytes(filepath) {
const bytes = $.NSData.dataWithContentsOfFile(
$.NSString.stringWithString(filepath)
.stringByStandardizingPath);
const bytesPtr = bytes.bytes;
var bytesArr = [];
const numBytes = Number(bytes.length);
for (let i = 0; i < numBytes; i++) {
bytesArr.push(bytesPtr[i]);
}
return bytesArr;
}
The disappointing thing in this particular case is that accessing the values in an array buffer has to be done iteratively in order to manually copy the values over into a JavaScript array
object. This isn't slower than the other methods, but it's slower than I feel it would have been were this not the case.
So it can be a little surprising when a more manual implementation that looks like it ought to be slower is, in fact, noticeably faster than using ready-made API methods/functions:
Access the hexadecimal string value and manually decimalise
The NSData
class object has a description
that contains the hexadecimal string representing the file's contents. It requires a small amount of clean up, using regular expressions, that trim unwanted characters and split the hex string into an array of paired hex bytes. Then JavaScript provides the map()
function that saves iterating manually, allowing each hex byte pair to be sent through the JXA translated version of my decimal()
handler from before:
function readBytes(filepath) {
const bytes = $.NSData.dataWithContentsOfFile(
$.NSString.stringWithString(filepath)
.stringByStandardizingPath);
var bytesArr = [];
const bytesStr = bytes.description;
bytesArr = ObjC.deepUnwrap(bytesStr
.stringByReplacingOccurrencesOfStringWithStringOptionsRange(
'(?i)\\<?([A-F0-9]{2})\\>?\\B', '$1 ',
$.NSRegularExpressionSearch,
$.NSMakeRange(0, bytesStr.length)
).componentsSeparatedByString(' ')
).map(hexbyte => {
if (hexbyte.length != 2) return null;
const hexchars = ["0", "1", "2", "3", "4", "5", "6", "7",
"8", "9", "a", "b", "c", "d", "e", "f"];
const hex = hexbyte.split('');
const hi = hexchars.indexOf(hex[1]),
lo = hexchars.indexOf(hex[0]);
return (lo * 16) + hi;
});
bytesArr.pop();
return bytesArr;
}
Headers
If you want to test any of the AppleScriptObjC code for yourself, include these lines at the top of the script:
use framework "Foundation"
use scripting additions
property this : a reference to the current application
property nil : a reference to missing value
property _1 : a reference to reference
property NSArray : a reference to NSArray of this
property NSData : a reference to NSData of this
property NSMutableArray : a reference to NSMutableArray of this
property NSScanner : a reference to NSScanner of this
property NSString : a reference to NSString of this
property NSASCIIStringEncoding : a reference to 1
property NSRegularExpressionSearch : a reference to 1024
property NSUTF16StringEncoding : a reference to 10
This is an exhaustive list that covers all of the various AppleScriptObjC snippets above, so you can delete any properties that aren't used in a specific script if you want to.
The script that ended up being fastest in my testing (which wasn't by any means thorough or even quantified, but it stood out as returning an immediate result) was number (2), which is written in vanilla AppleScript. Therefore, this does not require the above headers, and it's advisable not to include them if they aren't necessary.
For the JSObjC scripts, you will want to insert this auto-run function below the readBytes
function declaration:
(() => {
const filepath = '/Users/CK/Desktop/Pasted on 2019-07-28 at 07h08m.jpg';
return readBytes(filepath);
})();