Document Management with automatic OCR and translation

Document Management with automatic OCR and translation

One of the problems with living in a foreign country and not speaking the local language very well is that it can be a pain to deal with bills, tax documentation, government correspondence and other post that comes through your door.  My solution is an automated document management system that will translate anything that is scanned on our network scanner.

  • OCR (Optical Character Recognition) by either Tesseract or Abbyy API engine.
  • Translation performed by Google or Bing API’s.
  • Document language recognition by comparison of original and translated texts.
  • Full index performed on all documents.
  • Automatic tagging of documents based on keywords found in content.
  • Feature rich web interface written in PHP/jQuery allows tagging, searching and administration of documents.
  • Ability to translate any image, PDF or text document.
  • Email gateway – simply email the service with attachment and receive email reply with translations.

 

Tags: , , , , , ,