0

Given a PDF with Electrical Wiring schematics:

enter image description here

I have to read through and configure software to match the wiring for specific motors placed on individual emergency stop circuits. In practice, the beginning of the circuits is always on the same page (printed page, not pdf page number).

What I would like are some pointers in the direction I need to go, to be able to trace wires and make a list of components along each wire, to build the circuits. I know this involves both character recognition, and I guess 'image' recognition? Being able to follow the lines.

An example of one circuit is at: '2012101' (which means control panel 20. page 121. line 01) PBL2012101 -> CR2012103

Then the next circuit: COS800-1 -> COS800-2 -> ... -> COS804 -> CR2012133

Those individual nodes have a standard look, with an exception being, if the circuit were to continue past the end of that page (this is just the left half of the page) it would go to a box indicating where the circuit continued like from the PushButtonLight at line 01 -> [2014263].

So I know, essentially what I need to parse from the pdf, but I am having trouble getting started. I have found plenty of results on reading characters, but typically the discussion is about something like reading pages of a book. Could someone suggest a library (as I assume this will involve building some custom tools) or other reference to help me out?


Edit 5:02 PM - 3/28/15

Here is a link to some example code written based on the circuits drawn in the schematics (though the sample of schematics I have given is for a different control panel, on the same system) http://pastebin.com/uhMcVJv8

Typically I will just open an excel sheet and jot down the pullcords (COS...) and pushbuttons (PB...) for each control relay circuit (CR...)

Next I will view this page: enter image description here

Which shows what motors are on each circuit - example by following the last relay on the first line CR2012133 going to page 151 line 00 [20.151.00] I then put into the same excel sheet (to reference when configuring the struct in the code example) these motors: MTR800 MTR802 ... MTR805A

enter image description here

I am developing on a windows system with access to visual studio 2008 and 2012, additionally I have MinGW libraries installed.

Spektre
  • 49,595
  • 11
  • 110
  • 380
AChrapko
  • 166
  • 2
  • 2
  • 13
  • Please provide a sample document and describe your development environment. *Could someone suggest a library* - strictly speaking such recommendations are of topic here. – mkl Mar 28 '15 at 14:08
  • 2
    You won't be able to do it without a lot of effort, and I mean months of work, In PDF just extracting text is a challenge, now imagine understanding the lines to build a schematic, If all your drawings are the same may be you can just read the text and by the positions infer what is connected to. – Paulo Soares Mar 28 '15 at 21:03
  • .. for instance: what if the drawing software created those dashed lines by *drawing individual line segments*? It *may* be possible by restricting yourself to one kind of PDF, created by one single application, so it's (as much as possible) predictable what sort of information is encoded how. – Jongware Mar 28 '15 at 21:25
  • I do understand it would be quite a bit of work to make it robust, but as you mention- the drawings do have a few standards such as the EStop circuits always begin on page 121. We do have a process of extracting the text information already, which is then entered into a SQL table, however I have only seen the results of this, and not the actual methodology. The table contains `wire_number`, `from_point`, and `to_point`. I have spent a few hours trying to work out ways to see where a circuit starts and stops from that table. I was not able to differentiate Control Relay orders and other issues – AChrapko Mar 28 '15 at 21:29
  • @AChrapko As long as you don't supply a representative sample document, you can merely be told that in general your task is horribly complex. The sample *might* show that in your case the PDF contents are easier to interpret and so give rise to hints helping you along; or it might show that indeed you should drop the task for complexity reasons. – mkl Mar 30 '15 at 08:25

1 Answers1

0

Handle this as comment

  1. this needs huge effort just to make do something not to be robust !!!

    • robustness is entirely different level
  2. I would ignore PDF (at start) and start from screenshot or print overview images as input.

  3. make your self familiar with line detection algorithms

  4. recognize parts via OCR or similarities

    • for that you will need table of usable parts
    • can use simple OCR approaches
    • and similar images detection
    • if the circuits will have the same (or similar) scale then you can simplify the detection
  5. I would process text as last (from what is left)

  6. understanding the circuit

    • all bullets above was about recognition that is hard
    • but in comparison to this it is a piece of cake
    • you need to recreate the circuit to match the source image
    • and then understand the needed information from it
    • so start with closed loop detections and interconnections lists
    • you will need to create structures, hierarchy to handle the circuit in memory
    • I would also add some image output
    • and compare it with source image to check if something is not too different
    • like messed up connections, missed line, etc ...
    • I do not know of any lib or paper dealing with this kind of stuff
    • but that does not mean there is none out there ...

This task is so huge that it is doubtful it will be done in just few months

  • circuit diagrams varying so much (bus usage, styles,...)
  • the texts over circuits can mess up all ....
  • there are more conventions for parts images
  • so as mentioned in comments try to constrain the input as much as you can
  • also some feedback with operator is a wise to add
  • as you can see this topic is too broad for single answer here so (+Close)
  • you should start and create specific questions as you hit a wall
Community
  • 1
  • 1
Spektre
  • 49,595
  • 11
  • 110
  • 380