2

Recently, I have been trying to use the Google DLP API in Python 3 to classify the content of tables. I first started by testing the API on small examples, which all worked perfectly. However, as I attempted to send larger tables (1000 rows x 18 columns which is smaller than the 50 000 quota), the request would crash. After reducing the size of the table to 100 rows, I did manage to make it run, however a single request of 100 rows takes approximately 10 seconds. Most values are fairly short, you find some of the columns bellow:

  • Address
  • Date of birth
  • Email
  • First Name
  • Gender
  • Job Position
  • Last Name

Furthermore, after further experimentation, I have noticed that if the same table is provided as a string in a CSV format (columns separated by "," and rows by "\n"), running time is reduced by a factor of 10.

Is this a normal behaviour? Or am I perhaps using the api poorly leading to such poor running performances?

I hope my question is clear enough, Thanks for taking the time to read this ! :)

1 Answers1

1

It's a known issue being worked on. Some detectors (DOB and name detectors) are working slower than desired on structured data.

Jordanna Chord
  • 950
  • 5
  • 12
  • Ah, I see, thanks for the answer, How about the running as a string ? it's taking about a second for 100 lines (2962 tokens/words or 21101 characters), so something around 0.3 ms per token, is this a normal running time or is it slower than average ? Again, Sorry to bother you with these questions, thank you again for your time ! – Sofiane Mahiou Jul 31 '18 at 13:39
  • DOB, location and name detectors require complex algorithms that are currently on the slower side – Jordanna Chord Aug 07 '18 at 20:55
  • Got it ! Thank you for your time, have a nice day – Sofiane Mahiou Aug 07 '18 at 23:11
  • 1
    Just an update on this ... we made a change recently that increases the speed of this about 5x. The detectors as mentioned aboce will still be slower, but overall speed of table scans should be much improved. – Jordanna Chord Sep 06 '18 at 20:44