David Reagan
2018-02-15 01:26:38 UTC
Hey all,
I finally have time to experiment with Mayan-EDMS some more. So I'm back at
trying to get https://gitlab.com/startmat/document_analyzer working the way
I want.
Unfortunately, I can't seem to figure it out.
I'm currently testing on a vagrant instance. See:
https://gitlab.com/mayan-edms/mayan-edms-vagrant
I ended up copying the document_analyzer app into the apps directory to get
it loading.
I am using an Albertsons receipt to test with. The first two lines of OCR
look like:
4S Albertsons
the type I added and that the albertsons receipt's properties page says it
is.)
Parameter:
first;(?ims)(?P<albertsons>(.*Albertsons.*))
This should cause document_analyzer to add a "albertsons" field to either
the metadata or properties of the document. Am I wrong?
I also made an analyzer based on the document_analyzer's README.
Parameter:
first;(?i)(?P<Creator>Tele2|Apple|Microsoft|Billa|Albertsons)
I just added "Albertsons" to list of words to look for.
This should cause document_analyzer to add a "Creator" field to either the
metadata or properties of the document. Am I wrong?
I used the menu item "Submit to analyze"
http://localhost:8080/document_analyzer/analyzer/1/submit/ to run
document_analyzer.
All I can see in the logs is that I clicked that menu item. The document's
properties and metadata do not change.
Nothing is added to either the metadata or properties of the document.
If I test:
(?ims).*albertsons.*
on http://www.pyregex.com/ with the first two lines of the document, it
reports a success.
/usr/share/mayan-edms/mayan/settings/local.py looks like:
from __future__ import absolute_import, unicode_literals
from .base import *
SECRET_KEY = '5(kv&ow31r2m9e^#c65v%ppiwiv9epu-hxa*1jsa1#m5bi!g7+'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'mayan_edms',
'USER': 'mayan',
'PASSWORD': 'test123',
'HOST': 'localhost',
'PORT': '5432',
}
}
INSTALLED_APPS += (
'document_analyzer',
)
BROKER_URL = 'redis://127.0.0.1:6379/0'
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/0'
LOGGING = {
'version': 1,
'disable_existing_loggers': True,
'formatters': {
'verbose': {
'format': '%(levelname)s %(asctime)s %(name)s %(process)d
%(thread)d %(message)s'
},
'intermediate': {
'format': '%(name)s <%(process)d> [%(levelname)s]
"%(funcName)s() %(message)s"'
},
'simple': {
'format': '%(levelname)s %(message)s'
},
},
'handlers': {
'console':{
'level':'DEBUG',
'class':'logging.StreamHandler',
'formatter': 'intermediate'
}
},
'loggers': {
#'documents': {
# 'handlers':['console'],
# 'propagate': True,
# 'level':'DEBUG',
#},
#'common': {
# 'handlers':['console'],
# 'propagate': True,
# 'level':'DEBUG',
#},
'document_analyzer': {
'handlers':['console'],
'propagate': True,
'level':'DEBUG',
},
}
}
Does anyone have any tips? Am I missing a step somewhere?
I finally have time to experiment with Mayan-EDMS some more. So I'm back at
trying to get https://gitlab.com/startmat/document_analyzer working the way
I want.
Unfortunately, I can't seem to figure it out.
I'm currently testing on a vagrant instance. See:
https://gitlab.com/mayan-edms/mayan-edms-vagrant
I ended up copying the document_analyzer app into the apps directory to get
it loading.
I am using an Albertsons receipt to test with. The first two lines of OCR
look like:
4S Albertsons
It's just better.
I made an analyzer and assigned the 'receipt' document type to it. (That'sthe type I added and that the albertsons receipt's properties page says it
is.)
Parameter:
first;(?ims)(?P<albertsons>(.*Albertsons.*))
This should cause document_analyzer to add a "albertsons" field to either
the metadata or properties of the document. Am I wrong?
I also made an analyzer based on the document_analyzer's README.
Parameter:
first;(?i)(?P<Creator>Tele2|Apple|Microsoft|Billa|Albertsons)
I just added "Albertsons" to list of words to look for.
This should cause document_analyzer to add a "Creator" field to either the
metadata or properties of the document. Am I wrong?
I used the menu item "Submit to analyze"
http://localhost:8080/document_analyzer/analyzer/1/submit/ to run
document_analyzer.
All I can see in the logs is that I clicked that menu item. The document's
properties and metadata do not change.
Nothing is added to either the metadata or properties of the document.
If I test:
(?ims).*albertsons.*
on http://www.pyregex.com/ with the first two lines of the document, it
reports a success.
/usr/share/mayan-edms/mayan/settings/local.py looks like:
from __future__ import absolute_import, unicode_literals
from .base import *
SECRET_KEY = '5(kv&ow31r2m9e^#c65v%ppiwiv9epu-hxa*1jsa1#m5bi!g7+'
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': 'mayan_edms',
'USER': 'mayan',
'PASSWORD': 'test123',
'HOST': 'localhost',
'PORT': '5432',
}
}
INSTALLED_APPS += (
'document_analyzer',
)
BROKER_URL = 'redis://127.0.0.1:6379/0'
CELERY_RESULT_BACKEND = 'redis://127.0.0.1:6379/0'
LOGGING = {
'version': 1,
'disable_existing_loggers': True,
'formatters': {
'verbose': {
'format': '%(levelname)s %(asctime)s %(name)s %(process)d
%(thread)d %(message)s'
},
'intermediate': {
'format': '%(name)s <%(process)d> [%(levelname)s]
"%(funcName)s() %(message)s"'
},
'simple': {
'format': '%(levelname)s %(message)s'
},
},
'handlers': {
'console':{
'level':'DEBUG',
'class':'logging.StreamHandler',
'formatter': 'intermediate'
}
},
'loggers': {
#'documents': {
# 'handlers':['console'],
# 'propagate': True,
# 'level':'DEBUG',
#},
#'common': {
# 'handlers':['console'],
# 'propagate': True,
# 'level':'DEBUG',
#},
'document_analyzer': {
'handlers':['console'],
'propagate': True,
'level':'DEBUG',
},
}
}
Does anyone have any tips? Am I missing a step somewhere?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.