0

I want to generate a binary label systematically and I hope I can generate a three or four labels later, if my first analysis worked well.

Here is the code to make the binary label combinations: lst = sorted([i for i in itertools.product([1, 0], repeat=len(my_data))], key=sum)

The above code worked well if the samples to add labels were small like N = 5. However, if N is greater than 20 or so, I do get an error assuming due to a memory error.

Thus, I had to change my strategy and although this is not an ideal method, I found the other way to go from SO. Generate all binary strings of length n with k bits set

However, I then faced on a different problem. The data format I want to save for future use is like 1,1,1,1 or 0,1,0,1, not 1110 or 0101.

I guess I can modify the pseudo labels to be a comma separated by using awk after saved but it should be of great if I can convert the format as comma separated in python before saving for the convenience.

Your suggestions for both itertools.product and comma separation are greatly appreciated.

  • 2
    You can trivially add commas between all the characters of a string. `comma_separated = ','.join(list(binary_str))`. – Mark Ransom Nov 07 '22 at 02:30
  • This gave me '1111', '0101' ... to 1111, 0101 ... (https://stackoverflow.com/questions/1851134/generate-all-binary-strings-of-length-n-with-k-bits-set) but anyway, thank you for your help. I do keep in mind for future use. – user9690450 Nov 07 '22 at 23:37
  • That's because you were using my suggestion on a list of strings. It was intended to be used on each individual string in the list - I thought my naming convention of `binary_str` would be clear. Try `[','.join(list(binary_str)) for binary_str in result]`. – Mark Ransom Nov 08 '22 at 01:50

1 Answers1

0

Regarding itatools.product, I figured out and solved by myself referring by those website.

  1. Why do I get a MemoryError with itertools.product?
  2. https://note.nkmk.me/en/python-itertools-product/
  3. Create dataframe from itertools product
  4. Writing to csv file with itertools.product

In my case, I simply removed the sort and the code became lst = itertools.product([1, 0], repeat=len(my_data), which went well. Also, I was able to manage the sorting stuff in the later step. Thank you.