0

I am trying to read filebytes using AppleScript or JXA (I don't know which one is better yet). I already have tried this code:

set theFile to (choose file with prompt "Select a file to read:")
open for access theFile
set fileContents to (read theFile)
close access theFile

However that code will read the file as a string and store it in fileContents. I need this to be a byte array.

Damian Radinoiu
  • 97
  • 2
  • 10
  • Neither’s good. While `read theFile as data` will get you the raw content, there’s no native APIs for working with that. I suggest you look at using `NSData` via AppleScript-ObjC bridge; that’ll read your file and give you access to individual bytes. – foo Jul 28 '19 at 07:00
  • I have read about ObjectiveC, but I don't know how exactly how to integrate ObjectiveC in JXA. How can I use this https://stackoverflow.com/questions/47939544/reading-byte-array-from-a-file-in-objective-c?noredirect=1&lq=1 inside my JXA script or AppleScript ? – Damian Radinoiu Jul 28 '19 at 10:11
  • Documentation is lousy. Best bet’s probably [this](http://macosxautomation.com/applescript/apps/everyday_book.html). Perhaps you could explain what it is you’re trying to achieve? Depending on what you’re doing, you may be better writing your program in ObjC or Swift, or use another scripting language such as Python/Ruby/Node.js that has decent libraries for working with byte arrays. – foo Jul 28 '19 at 14:33
  • @foo: I thought about NSData, but unfortunately the NSData methods that return bytes all use void* buffers, and they crash AppleScript. Unless you know some trick I don't for making void* work in AppleScript, NSData's a no-go. – Ted Wrigley Jul 28 '19 at 16:05
  • @TedWrigley No, AppleScriptObjC can't access array buffers. But I've added my answer which includes a JSObjC script that _can_ access them. – CJK Jul 28 '19 at 17:27
  • From the information given, CJK’s JSObjC script looks closest to what you describe. If you really want to use a scripting language, Python3 (or 2) would be a better choice than AS/JXA as it already includes a native `bytes` datatype and a `struct` module for converting raw byte sequences to/from native values. Or just do it in ObjC/Swift. As I say, it’ll be more productive if you explain *why* you want to manipulate a file at the raw bytes level. Otherwise you’re just going to get a lot of duct-taped kludges that at best do what you describe, but not necessarily what you need. – foo Jul 28 '19 at 18:44

2 Answers2

0

I knew I'd seen this somewhere before. There's an old post at MacScripter where people dive into this problem fairly deeply. It's well worth a read if you're inclined that way, but the simplest version seems to be this:

set theFile to choose file
set theBytes to getByteValues(theFile)

on getByteValues(thisFile) -- thisFile's an alias or a file specifier.
    script o
        property integerValues : {}
        property byteValues : {}

        on convertBytesToHex()
            repeat with thisItem in byteValues
                set s to ""
                repeat until contents of thisItem = 0
                    tell (thisItem mod 16)
                        if it > 9 then
                            set s to character (it - 9) of "ABCDEF" & s
                        else
                            set s to (it as string) & s
                        end if
                    end tell
                    set contents of thisItem to thisItem div 16
                end repeat
                set contents of thisItem to s
            end repeat
        end convertBytesToHex
    end script

    set fRef to (open for access thisFile)
    try
        -- The file will be read as a set of 4-byte integers, but does it contain an exact multiple of 4 bytes?
        set oddByteCount to (get eof fRef) mod 4
        set thereAreOddBytes to (oddByteCount > 0)
        -- If the number of bytes isn't a multiple of 4, treat the odd ones as being in the first four, then …
        if (thereAreOddBytes) then set end of o's integerValues to (read fRef from 1 for 4 as unsigned integer)
        -- … read integers from after the odd bytes (if any) to the end of the file.
        set o's integerValues to o's integerValues & (read fRef from (oddByteCount + 1) as unsigned integer)
        close access fRef
    on error errMsg number errNum
        close access fRef
        error errMsg number errNum
    end try

    -- Extract the odd-byte values (if any) from the first integer.
    if (thereAreOddBytes) then
        set n to beginning of o's integerValues
        repeat oddByteCount times
            set end of o's byteValues to n div 16777216
            set n to n mod 16777216 * 256
        end repeat
    end if
    -- Extract the 4 byte values from each of the remaining integers.
    repeat with i from 1 + ((thereAreOddBytes) as integer) to (count o's integerValues)
        set n to item i of o's integerValues
        set end of o's byteValues to n div 16777216
        set end of o's byteValues to n mod 16777216 div 65536
        set end of o's byteValues to n mod 65536 div 256
        set end of o's byteValues to n mod 256 div 1
    end repeat

    o's convertBytesToHex()

    return o's byteValues
end getByteValues

on convertNumberToHex(aNumber)
    set s to ""
    set n to get aNumber
    repeat until n is 0
        tell (n mod 16)
            if it > 9 then
                set s to character (it - 9) of "ABCDEF" & s
            else
                set s to (it as string) & s
            end if
        end tell
        set n to n div 16
    end repeat
    set contents of aNumber to s
end convertNumberToHex

I've added a routine to convert the integer values to hex-value strings; not sure which form you prefer.

Ted Wrigley
  • 2,921
  • 2
  • 7
  • 17
0

I have experimented a little and devised a number of methods with which a file's contents might be read into a list or array of bytes. In each case, the filepath should be a posix path to the file being read.

Any snippets using AppleScriptObjC will need appropriate headers inserted at the top of the script, and I have included them at the end, along with the extra block that will be used with JXA scripts.

1. read the file and obtain the ASCII number of each character

The file is read "as is", and each character of the string is converted into an ascii code value:

to readBytes from filepath as text
    local filepath

    script bytes
        property list : characters of (read the filepath)
    end script

    repeat with char in (a reference to the list of bytes)
        set char's contents to ASCII number char
    end repeat

    return the list of bytes
end readBytes

Here's a similar implementation using AppleScriptObjC:

to readBytes from filepath as text
    local filepath

    set bytes to NSMutableArray's new()

    set hexdump to (NSString's stringWithContentsOfFile:((NSString's ¬
        stringWithString:filepath)'s stringByStandardizingPath()) ¬
        encoding:NSASCIIStringEncoding |error|:nil)

    repeat with i from 0 to (hexdump's |length|()) - 1
        (bytes's addObject:(hexdump's characterAtIndex:i))
    end repeat

    return the bytes as list
end readBytes

2. read the file into a list a short (2-byte) integers and then extract the high- and low-byte values from each

This is the fastest method, and again uses the standard additions read command, this type mapping the contents directly into a list of short integers. If the number of bytes is odd, then the first byte is is read singly, whilst the remaining are 2-byte pairs that are extracted into 1-byte values and returned as a list:

to readBytes from filepath as text
    local filepath

    script bytes
        property length : get eof of filepath
        property index : length mod 2 + 1
        property shortInts : read filepath as short ¬
            from index for length - index - 1
        property list : {}
    end script

    if bytes's index = 2 then set the end of the list of bytes ¬
        to ASCII number of (read filepath for 1)

    repeat with shortInt in bytes's shortInts
        set abs to (shortInt + 65536) mod 65536
        set the end of the list of bytes to abs div 256
        set the end of the list of bytes to abs mod 256
    end repeat

    return the list of bytes
end readBytes

3. read the file into a data class object and convert the hexadecimal byte values to their decimal representation

The use of read here pulls a raw data encapsulated object that, strictly speaking, we can't do a lot with as it isn't a type class that coerces to any other. However, the additional handler __string__() is a quick and dirty method of getting the hexadecimal byte values, which are then converted to decimal form and returned:

to __string__(object)
    if the object's class = text then return the object

    set tids to my text item delimiters

    try
        set s to {_:object} as null
    on error e
        set my text item delimiters to "Can’t make {_:"
        set s to text items 2 thru -1 of e as text

        set my text item delimiters to "} into type null."
        set s to text items 1 thru -2 of s as text

        set my text item delimiters to tids
    end try

    s
end __string__

to readBytes from filepath as text
    local filepath

    script bytes
        property data : read filepath as data
        property list : {}
    end script

    script hexdump
        property chars : "0123456789ABCDEF"
        property string : text 11 thru -2 of __string__(bytes's data)
        property hibyte : a reference to text 2 of my string
        property lobyte : a reference to text 1 of my string

        to decimal()
            set i to (offset of hibyte in chars) - 1
            set j to (offset of lobyte in chars) - 1

            i + j * 16
        end decimal
    end script

    repeat ((hexdump's string's length) / 2 - 1) times
        set the end of the list of bytes to hexdump's decimal()
        set hexdump's string to hexdump's string's text 3 thru -1
    end repeat

    return the list of bytes
end readBytes

4. Use AppleScriptObjC to transform an ascii string into unicode hex values then convert to decimal using NSScanner

I included it as an alternative way to convert hexadecimal byte strings to integer decimal values using NSScanner, but it's actually slow than my vanilla AppleScript handler decimal(), so this method is more for general interest:

to readBytes from filepath as text
    local filepath

    set hexdump to ((NSString's stringWithContentsOfFile:((NSString's ¬
        stringWithString:filepath)'s stringByStandardizingPath()) ¬
        encoding:NSASCIIStringEncoding |error|:nil)'s ¬
        stringByApplyingTransform:"Any-Hex" |reverse|:no)'s ¬
        componentsSeparatedByString:"\\u00"

    hexdump's removeFirstObject()
    set hexbytes to hexdump's objectEnumerator()

    script bytes
        property list : {}
    end script

    repeat
        set hexbyte to the nextObject() of the hexbytes
        if hexbyte = missing value then exit repeat
        set scanner to NSScanner's scannerWithString:hexbyte
        set [bool, s] to scanner's scanHexInt:_1
        set the end of the list of the bytes to s as integer
    end repeat

    return the list of bytes
end readBytes

5. Use JSObjC (JXA-ObjectiveC) to read the raw data then...

  • Retrieve an array of C-pointers to the bytes values directly

    One of the nice things about JXA is the access it has to other data types outwith AppleScriptObjC, which means we can manipulate C data types and access array buffers:

    function readBytes(filepath) {
        const bytes    = $.NSData.dataWithContentsOfFile(
                            $.NSString.stringWithString(filepath)
                             .stringByStandardizingPath);
        const bytesPtr = bytes.bytes;
        var   bytesArr = [];
        const numBytes = Number(bytes.length);
    
        for (let i = 0; i < numBytes; i++) {
            bytesArr.push(bytesPtr[i]);
        }
    
        return bytesArr;    
    }
    

    The disappointing thing in this particular case is that accessing the values in an array buffer has to be done iteratively in order to manually copy the values over into a JavaScript array object. This isn't slower than the other methods, but it's slower than I feel it would have been were this not the case.

    So it can be a little surprising when a more manual implementation that looks like it ought to be slower is, in fact, noticeably faster than using ready-made API methods/functions:

  • Access the hexadecimal string value and manually decimalise

    The NSData class object has a description that contains the hexadecimal string representing the file's contents. It requires a small amount of clean up, using regular expressions, that trim unwanted characters and split the hex string into an array of paired hex bytes. Then JavaScript provides the map() function that saves iterating manually, allowing each hex byte pair to be sent through the JXA translated version of my decimal() handler from before:

    function readBytes(filepath) {
        const bytes    = $.NSData.dataWithContentsOfFile(
                            $.NSString.stringWithString(filepath)
                             .stringByStandardizingPath);
        var   bytesArr = [];
        const bytesStr = bytes.description;
    
        bytesArr = ObjC.deepUnwrap(bytesStr
        .stringByReplacingOccurrencesOfStringWithStringOptionsRange(
                 '(?i)\\<?([A-F0-9]{2})\\>?\\B', '$1 ',
                 $.NSRegularExpressionSearch,
                 $.NSMakeRange(0, bytesStr.length)
        ).componentsSeparatedByString(' ')
        ).map(hexbyte => {
            if (hexbyte.length != 2) return null;
    
            const hexchars = ["0", "1", "2", "3", "4", "5", "6", "7",
                              "8", "9", "a", "b", "c", "d", "e", "f"];
            const hex = hexbyte.split('');
            const hi  = hexchars.indexOf(hex[1]),
                  lo  = hexchars.indexOf(hex[0]);
    
            return (lo * 16) + hi;
        });
        bytesArr.pop();
        return bytesArr;
    }
    

Headers

If you want to test any of the AppleScriptObjC code for yourself, include these lines at the top of the script:

use framework "Foundation"
use scripting additions

property this : a reference to the current application
property nil : a reference to missing value
property _1 : a reference to reference

property NSArray : a reference to NSArray of this
property NSData : a reference to NSData of this
property NSMutableArray : a reference to NSMutableArray of this
property NSScanner : a reference to NSScanner of this
property NSString : a reference to NSString of this

property NSASCIIStringEncoding : a reference to 1
property NSRegularExpressionSearch : a reference to 1024
property NSUTF16StringEncoding : a reference to 10

This is an exhaustive list that covers all of the various AppleScriptObjC snippets above, so you can delete any properties that aren't used in a specific script if you want to.

The script that ended up being fastest in my testing (which wasn't by any means thorough or even quantified, but it stood out as returning an immediate result) was number (2), which is written in vanilla AppleScript. Therefore, this does not require the above headers, and it's advisable not to include them if they aren't necessary.

For the JSObjC scripts, you will want to insert this auto-run function below the readBytes function declaration:

(() => {
    const filepath = '/Users/CK/Desktop/Pasted on 2019-07-28 at 07h08m.jpg';
    return readBytes(filepath);
})();
CJK
  • 5,732
  • 1
  • 8
  • 26