0

I need to create file of 10,000 random integers for testing. I will be using the file in Python and C, so I can't have the data represented as strings because I don't want the extra overhead of integer conversion in C.

In Python I can use struct.unpack to convert the file to integer, but I can't use the write() method to write that to a file for use in C.

Is there any way in Python to write just integers, not integers-as-strings, to a file? I have used print(val, file=f) and f.write(str(val)), but in both cases it writes a string.

Here is where I am now:

file_root = "[ file root ]"

file_name = file_root + "Random_int64"

if os.path.exists(file_name):
    f = open(file_name, "wb")
    f.seek(0)

for _ in range(10000):
    val = random.randint(0, 10000)
    f.write(bytes(val))

f.close()
f = open(file_name, "rb")

wholefile = f.read()
struct.unpack(wholefile, I)

My unpack format string is wrong, so I am working on that now. I'm not that familiar with struct.unpack.

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
RTC222
  • 2,025
  • 1
  • 20
  • 53
  • 1
    "I can't use the write() method" -- why? It's the natural thing to use when writing raw bytes to a file. Or did you mean, you can use it, but you got the wrong result when you tried it? – harold Aug 26 '23 at 19:24
  • 2
    You can use modules `struct`, `array` or `ctypes` to create a `bytes` object from int data. You can then `write` this object to a file. – Michael Butscher Aug 26 '23 at 19:25
  • 3
    [This answer](https://stackoverflow.com/a/12092564/22390653) explains how to write bytes to file in python. You need to open your files in binary mode. – PatioFurnitureIsCool Aug 26 '23 at 19:30
  • 2
    Plain text is, _by far_, the most portable format when using a file in different environments. You might want to rethink if a slight reduction in overhead is worth all the headaches you will certainly have by wrestling with binary formats. – John Gordon Aug 26 '23 at 19:31
  • Thanks for the comments. I'll try to write as binary with "wb", and I'll try ctypes. @John Gordon -- how will C see the data -- as integer or text? – RTC222 Aug 26 '23 at 19:35
  • 2
    I don't know. Experiment and find out. (This is exactly the kind of headache I meant. If the file were plain text, then I **know** how both environments would see it. But binary? I haven't a clue.) – John Gordon Aug 26 '23 at 19:41
  • I tried opening the file as "wb" and write as f.write(bytes(val)). That works, but when I read the file I use int(item) as I iterate, but that returns zeroes. I will try struct.unpack to see if that does it. – RTC222 Aug 26 '23 at 19:44
  • 1
    It helps to post the non-working code so we know which part needs to change. Maybe its just "wb" or you are using unpack instead of pack or maybe the format is wrong. – tdelaney Aug 26 '23 at 19:54
  • 1
    What are the bounds for these integers? All positive? Stuff that fits in 64 bits... signed or unsigned? There are an infinite number of integers so getting even one chosen randomly from the full set is still infinite. – tdelaney Aug 26 '23 at 19:55
  • 1
    (1) all positive in range(0,10000). I will update my question above to post the code as it is now. – RTC222 Aug 26 '23 at 19:58
  • 2
    If you use `unpack` to read the file, why do you not use `pack` to write the file, and use the same (whatever) format in both cases? – mkrieger1 Aug 26 '23 at 20:06
  • 2
    What your C program will see in the file is of course exactly what your Python wrote to it, binary files aren't rocket science (but text files are) – harold Aug 26 '23 at 20:09
  • 1
    The [documentation](https://docs.python.org/3/library/struct.html#struct.unpack) says `struct.unpack(format, buffer)`. Why are you passing it `wholefile`. Isn't that the contents of the file? And what is `I`? You don't define it. – Bill Aug 26 '23 at 20:18
  • Text figures are the best exchange format between different programming languages. C and python can easily convert text to integer. – Hermann12 Aug 26 '23 at 20:19
  • According to the struct docs, I is unsigned int 64; @mkrieger, I will switch to struct now. – RTC222 Aug 26 '23 at 20:19
  • @Bill, I am still working out the format string for unpack. I'll update the question above when I'm finished. Thanks. – RTC222 Aug 26 '23 at 20:21
  • Note that the format string should be the first argument of `struct.unpack` not the second. – Bill Aug 26 '23 at 20:23
  • If you've heard of `struct – juanpa.arrivillaga Aug 26 '23 at 20:28

1 Answers1

6

bytes(val), when val is an int, creates a bytes object of the length specified. If your random number is 12345, you are writing 12345 zeros, not the number. The trick is to pack and then write each integer.

From the struct module Byte Order, Size, and Alignment section, "<" writes bytes "little endian" (the byte order used by Intel/AMD). The next character could be "L" to wirte 4 byte unsigned long integers or "Q" to write 8 bytes. 4 is plenty big for your range of characters and produces a smaller file, but 8 is more "future proof" if you want to larger values in the future.

Assuming you want no repeats in the random numbers, you can create a list of integers, shuffle them, then write to a file one by one. Make sure to open a binary file so that there is no encoding done.

With a bit more cleanup you get

import random
import struct

file_root = "testfile"
file_name = file_root + "Random_int64"

with open(file_name, "wb") as f:
    for _ in range(10000):
        f.write(struct.pack("<Q", random.randint(0, 10000)))

You could also use a bytearray and packinto to build the buffer first and write once.

import random
import struct

file_root = "testfile"
file_name = file_root + "Random_int64"

buf = bytearray(10000*8)
for offset in range(10000*8, 8):
    struct.pack_into(buf, "<Q", offset, random.randint(0, 10000))

with open(file_name, "wb") as f:
    f.write(buf)

And if you don't mind using packages outside of the standard library, numpy has the classic

import numpy as np
np.random.randint(10000, size=10000).tofile("test.bin")

If we are placing bets on performance, that's where I'd go.

tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • 1
    Totally reasonable, but maybe it's worth using `struct.pack_into`, it avoids the intermediate `bytes` object. But probably not a big deal either way – juanpa.arrivillaga Aug 26 '23 at 20:29
  • @bill - thanks, fixed – tdelaney Aug 26 '23 at 20:37
  • Thanks very much @tdelaney. When I put this in PyCharm I see f.write(struct.pack(" – RTC222 Aug 26 '23 at 20:39
  • @RTC222 - that may be the missing paren noted by bill. I've fixed it in the example. – tdelaney Aug 26 '23 at 20:43
  • I have the missing paren and still the exception. – RTC222 Aug 26 '23 at 20:45
  • @juanpa.arrivillaga - good point. It would be hard to judge when writing one int at a time without some profiling. It may be more interesting if you create a `bytearray` of the full size, populate, and only do one write. – tdelaney Aug 26 '23 at 20:46
  • @tdelaney - It looks like that's the way to go. I'll try that now, as a Python list, and write the whole buffer. – RTC222 Aug 26 '23 at 20:47