Discussion:
[Mayan EDMS: 2264] I need help with document_analyzer basics
David Reagan
2018-02-15 01:26:38 UTC
Permalink
Hey all,

I finally have time to experiment with Mayan-EDMS some more. So I'm back at
trying to get https://gitlab.com/startmat/document_analyzer working the way
I want.

Unfortunately, I can't seem to figure it out.

I'm currently testing on a vagrant instance. See:
https://gitlab.com/mayan-edms/mayan-edms-vagrant

I ended up copying the document_analyzer app into the apps directory to get
it loading.

I am using an Albertsons receipt to test with. The first two lines of OCR
look like:

4S Albertsons
It's just better.
I made an analyzer and assigned the 'receipt' document type to it. (That's
the type I added and that the albertsons receipt's properties page says it
is.)

Parameter:
first;(?ims)(?P<albertsons>(.*Albertsons.*))


This should cause document_analyzer to add a "albertsons" field to either
the metadata or properties of the document. Am I wrong?

I also made an analyzer based on the document_analyzer's README.

Parameter:
first;(?i)(?P<Creator>Tele2|Apple|Microsoft|Billa|Albertsons)

I just added "Albertsons" to list of words to look for.


This should cause document_analyzer to add a "Creator" field to either the
metadata or properties of the document. Am I wrong?


I used the menu item "Submit to analyze"
http://localhost:8080/document_analyzer/analyzer/1/submit/ to run
document_analyzer.


All I can see in the logs is that I clicked that menu item. The document's
properties and metadata do not change.


Nothing is added to either the metadata or properties of the document.


If I test:


(?ims).*albertsons.*


on http://www.pyregex.com/ with the first two lines of the document, it
reports a success.


/usr/share/mayan-edms/mayan/settings/local.py looks like:

from __future__ import absolute_import, unicode_literals

from .base import *

SECRET_KEY = '5(kv&ow31r2m9e^#c65v%ppiwiv9epu-hxa*1jsa1#m5bi!g7+'

DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'mayan_edms',
'USER': 'mayan',
'PASSWORD': 'test123',
'HOST': 'localhost',
'PORT': '5432',
}
}
INSTALLED_APPS += (
'document_analyzer',
)

BROKER_URL = 'redis://127.0.0.1:6379/0'
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/0'

LOGGING = {
'version': 1,
'disable_existing_loggers': True,
'formatters': {
'verbose': {
'format': '%(levelname)s %(asctime)s %(name)s %(process)d
%(thread)d %(message)s'
},
'intermediate': {
'format': '%(name)s <%(process)d> [%(levelname)s]
"%(funcName)s() %(message)s"'
},
'simple': {
'format': '%(levelname)s %(message)s'
},
},
'handlers': {
'console':{
'level':'DEBUG',
'class':'logging.StreamHandler',
'formatter': 'intermediate'
}
},
'loggers': {
#'documents': {
# 'handlers':['console'],
# 'propagate': True,
# 'level':'DEBUG',
#},
#'common': {
# 'handlers':['console'],
# 'propagate': True,
# 'level':'DEBUG',
#},
'document_analyzer': {
'handlers':['console'],
'propagate': True,
'level':'DEBUG',
},

}
}


Does anyone have any tips? Am I missing a step somewhere?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2018-02-17 12:17:29 UTC
Permalink
Hi David,
you can navigate to the document_analyzer result by selecting the document
version page and then select "Analyzer result" from the "Actions" Menu of
the related Document Version.




The Analyzer Result is not stored as Metadata, it is using its own
structure. You are able to build Mayan Indexes based on the Analyzer Result.

For you example you can build an Index like that: {{
document.analyzer_value_of.Creator }}

br
Matthias
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
David Reagan
2018-02-17 16:24:29 UTC
Permalink
Thanks Matthias.

Now I know where to look.

When I read the docs the other day, I thought indexes seemed similar to
a folder structure. Is that an ok way to think of them?

Is there a way to use document_analyzer to add tags, metadata, or
properties?

For example, if I upload a receipt from Amazon.com, I'd like to add it
to the "2018->Amazon" index, tag it with something pulled from the Items
Ordered section, and add metadata that includes: total, billed date,
ordered date, Amazon.com order number, and what card I used.
 Hi David,
you can navigate to the document_analyzer result by selecting the
document version page and then select "Analyzer result" from the
"Actions" Menu of the related Document Version.
The Analyzer Result is not stored as Metadata, it is using its own
structure. You are able to build Mayan Indexes based on the Analyzer Result.
For you example you can build an Index like that:  {{
document.analyzer_value_of.|Creator| }}
br
Matthias
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
- David Reagan
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Alan
2018-04-26 01:18:23 UTC
Permalink
Bump
Post by David Reagan
Thanks Matthias.
Now I know where to look.
When I read the docs the other day, I thought indexes seemed similar to
a folder structure. Is that an ok way to think of them?
Is there a way to use document_analyzer to add tags, metadata, or
properties?
For example, if I upload a receipt from Amazon.com, I'd like to add it
to the "2018->Amazon" index, tag it with something pulled from the Items
Ordered section, and add metadata that includes: total, billed date,
ordered date, Amazon.com order number, and what card I used.
Post by Matthias Löblich
Hi David,
you can navigate to the document_analyzer result by selecting the
document version page and then select "Analyzer result" from the
"Actions" Menu of the related Document Version.
The Analyzer Result is not stored as Metadata, it is using its own
structure. You are able to build Mayan Indexes based on the Analyzer
Result.
Post by Matthias Löblich
For you example you can build an Index like that: {{
document.analyzer_value_of.|Creator| }}
br
Matthias
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/mayan-edms/1vDxSIvulNI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
- David Reagan
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...