Discussion:
[Mayan EDMS: 2265] Can OCR be trained, or otherwise improved?
David Reagan
2018-02-15 01:33:04 UTC
Permalink
While experimenting with Mayan, I've noticed that the OCR is pretty
unreliable.

CHRNGE instead of CHANGE, HOU instead of HOW, CRSHIER instead of CASHIER,
UUU instead of WWW, OOESTIONS instead of QUESTIONS, etc.

Those are all examples on just one receipt. And the preview is pretty darn
good looking.

So, is there a way to teach the OCR to get better?

Or some other way to improve OCR results? Maybe a newer version of
Tesseract?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
l***@gmail.com
2018-03-01 01:21:51 UTC
Permalink
OCR itself is very prone errors. I've had good experience using
transformation to lower the color space of images. I wonder why Tesseract
doesn't do this itself.

As for training, as far as I know Tesseract can be trained. Don't know the
process. I think that language files for Tesseract are actually training
files.

Some links I found on the topic:

http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/
https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-3.03–3.05
Post by David Reagan
While experimenting with Mayan, I've noticed that the OCR is pretty
unreliable.
CHRNGE instead of CHANGE, HOU instead of HOW, CRSHIER instead of CASHIER,
UUU instead of WWW, OOESTIONS instead of QUESTIONS, etc.
Those are all examples on just one receipt. And the preview is pretty darn
good looking.
So, is there a way to teach the OCR to get better?
Or some other way to improve OCR results? Maybe a newer version of
Tesseract?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...