-1

Here's the scenario, I have to run clustering algorithm over 10000 data points. I have precomputed the distances between the data points and stored them in a file. Since Python is slow in I/O intensive tasks, I am writing this clustering algorithm in C++. The main issue is that the clustering algorithm will run several times and I have to switch between the python code and C++ code. Something like this

Read Distances from text_file (C++)
Run Clustering Algorithm (C++)

Use the result of this algorithm in main python code

Run clustering algorithm again (C++)

Now I don't want to read the distance file again and again, as it already takes around 17 seconds and the file has over 500 million entries. Something like pausing the execution of C++ code and running the code again when needed. So, how could this be achieved??

khirod
  • 345
  • 3
  • 6
  • 18

1 Answers1

1

just an idea:

Can you maybe run the c++ part your program within your main python program. You can do that by looking at the answers in this [Calling an external command in Python. You can use Adapter design pattern to pre-process the output in your c++ program so it becomes compatible with the data structures used in your main python program and vice-versa.

Community
  • 1
  • 1
ultrajohn
  • 2,527
  • 4
  • 31
  • 56