1

I have a large series of jpeg/png images. I want to eventually run a neural network on these. However first I have to vectorise the images. Because there is a large volume of images, I plan to use python and Spark instead of software.

I'm a beginner programmer. Would anyone know some rough code to do this? Failing this, alternative methods would also be very welcome!

Many many thanks in advance!

  • This is rather broad! Do you have a more specific programming question? What do you mean by "vectorise" the images? – mdurant Mar 02 '15 at 21:34
  • Have a look at my answer here, and feel free to vote for it too :-) http://stackoverflow.com/questions/28748282/black-and-white-png-to-svg/28749734#28749734 – Mark Setchell Mar 02 '15 at 22:10
  • Be careful: Python and Spark are software too! – Daniel Darabos Mar 03 '15 at 10:22
  • you might want to look at thunder project as it does exactly what you are looking for..... I recently came across this as a part of cloud computing course. Check out their github repo for code samples..... http://thunder-project.org/thunder/docs/ – digidude Nov 23 '15 at 19:50

1 Answers1

0
  1. Before thinking about Spark and distributed compute implement your approach at the local machine processing a single image. If you like python, you can use something like http://scikit-image.org/docs/dev/auto_examples/, but it highly depends on what you want to achieve
  2. If the volume of images is high, store them in SequenceFile on HDFS. This question will help you with the code: Store images/videos into Hadoop HDFS
  3. Implement you vectorization approach at scale: read data from SeqenceFile using SparkContext, put your vectorization Python implementation into the Spark map() function and apply it to all the images you have on the distributed cluster. Then save the data back to HDFS
  4. Unfortunately, with neural networks you would have to run your algorithms locally as no NN is implemented in MLlib yet. Again, something like sklearn might be helpful if you like python: http://scikit-learn.org/stable/modules/neural_networks.html
Community
  • 1
  • 1
0x0FFF
  • 4,948
  • 3
  • 20
  • 26