Discussion:
[Mayan EDMS: 1744] OCR and document names
Douglas Van Es
2017-05-27 22:00:50 UTC
Permalink
hello. just installed mayan via docker, it's up and running, it's looks
like it is going to work great.

i've read through the documentation, but i do have one question before i
continue the set up of users, document types, etc. and roll this out to
our users.

will i be able to use OCR to grab an invoice number from a scanned or
emailed document and have mayan name the document based on the results of
the OCR?

would that be set up as a transfromation, or some other way?

i am basically looking to really minimize the workload on our clerks who
will be scanning the invoices into mayan.

thank you all for your time, and the project looks amazing by the way!

doug van es
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Douglas Van Es
2017-06-06 17:15:15 UTC
Permalink
i've now set up a couple of users, a group, a role, some metadata,
watched and staging folders.

test document uploads are working great.

after reading through the docs and website, i still can't figure out how
to set up OCR to capture an invoice number and rename the document based
on the result.

can anyone tell me if this is possible with mayan? any hint's on how to
implement?

thanks in advance!
Post by Douglas Van Es
hello. just installed mayan via docker, it's up and running, it's looks
like it is going to work great.
i've read through the documentation, but i do have one question before i
continue the set up of users, document types, etc. and roll this out to
our users.
will i be able to use OCR to grab an invoice number from a scanned or
emailed document and have mayan name the document based on the results
of the OCR?
would that be set up as a transfromation, or some other way?
i am basically looking to really minimize the workload on our clerks who
will be scanning the invoices into mayan.
thank you all for your time, and the project looks amazing by the way!
doug van es
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
David Kornahrens
2017-06-07 00:13:08 UTC
Permalink
I'm currently trying to walk myself through the program as well. We really
see the potential here, but help doesn't come quick. I'm interested in
getting a support plan, but not if the support speed doesn't increase.

Roberto has answered a few questions, but it's more of a waiting game
really. I posted some issues in the GitLab repository, but nothing on that
yet either. Let me known if you figure it out, we are looking into the
same thing.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Douglas Van Es
2017-06-08 20:53:52 UTC
Permalink
if i crack this or hear from anyone at mayan i'll be sure to let you know.

i'm in the same boat, if i can be sure mayan is going to work for us a
support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as well. We
really see the potential here, but help doesn't come quick. I'm
interested in getting a support plan, but not if the support speed
doesn't increase.
Roberto has answered a few questions, but it's more of a waiting game
really. I posted some issues in the GitLab repository, but nothing on
that yet either. Let me known if you figure it out, we are looking into
the same thing.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2017-06-09 08:42:16 UTC
Permalink
Hi,
it did an Extension for mayan called /document_analyzer

https://gitlab.com/mayan-edms/document_analyzer

The idea behind is to analyze a document and store the result in an generic
way (similar to metadata structure). At the moment there are two
"analyzers" implemented. One which reads the exif data and one where you
can configure regular expressions which are used to parse the ocr result of
an document.
If you are able to write an regular expression to parse the invoice number
(be aware that the ocr qualtity is very important !) you can use the
extension to store the invoice number in a metadata like structure. You can
also configure an mayan index on it.

br
Matthias
Post by Douglas Van Es
if i crack this or hear from anyone at mayan i'll be sure to let you know.
i'm in the same boat, if i can be sure mayan is going to work for us a
support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as well. We
really see the potential here, but help doesn't come quick. I'm
interested in getting a support plan, but not if the support speed
doesn't increase.
Roberto has answered a few questions, but it's more of a waiting game
really. I posted some issues in the GitLab repository, but nothing on
that yet either. Let me known if you figure it out, we are looking into
the same thing.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Douglas Van Es
2017-06-15 17:18:23 UTC
Permalink
wow thank you matthias, this looks like it may work for me.

i have a couple of questions based on the docs at the github site, and am
wondering if you could help me out with them. what would my mayan root
folder be on an install using docker? i've looked around /var/lib/docker
and can't quite figure out the correct place to create a link to
document_analyzer...

would it be something like this: /var/lib/docker/aufs/mnt/HASHEDNAME/usr/
local/bin/ ? i don't have an apps folder in there.

i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/ and
so will be able to edit that file to include document_analyser in the
list of installed apps, but can't find a /mymayanroot/apps folder.

will the migrations step shown on the git page be the same for a docker
install? eg: mayan-edms.py migrate ? i suppose i would execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?

thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result in an
generic way (similar to metadata structure). At the moment there are two
"analyzers" implemented. One which reads the exif data and one where you
can configure regular expressions which are used to parse the ocr result
of an document.
If you are able to write an regular expression to parse the invoice
number (be aware that the ocr qualtity is very important !) you can use
the extension to store the invoice number in a metadata like structure.
You can also configure an mayan index on it.
br Matthias
Post by Douglas Van Es
if i crack this or hear from anyone at mayan i'll be sure to let you know.
i'm in the same boat, if i can be sure mayan is going to work for us a
support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as well. We
really see the potential here, but help doesn't come quick. I'm
interested in getting a support plan, but not if the support speed
doesn't increase.
Roberto has answered a few questions, but it's more of a waiting game
really. I posted some issues in the GitLab repository, but nothing
on that yet either. Let me known if you figure it out, we are
looking into the same thing.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2017-06-19 11:25:34 UTC
Permalink
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer, but if I
look into the mayan docker file:

https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/Dockerfile

It is using ubuntu:16.04 image and installing mayan by "RUN pip install
mayan-edms==2.3". So I guess mayan will be installed in sitepackages.

How to find the sitepackages-folder:

MY Laptop is an:
~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.2 LTS
Release: 16.04
Codename: xenial

Start python:
~$ python
Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Post by Douglas Van Es
Post by Matthias Löblich
import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
But this might be a good question for Roberto: How to integrate an
Extension in to the mayan docker image.


br
Matthias
Post by Douglas Van Es
wow thank you matthias, this looks like it may work for me.
i have a couple of questions based on the docs at the github site, and am
wondering if you could help me out with them. what would my mayan root
folder be on an install using docker? i've looked around /var/lib/docker
and can't quite figure out the correct place to create a link to
document_analyzer...
would it be something like this: /var/lib/docker/aufs/mnt/HASHEDNAME/usr/
local/bin/ ? i don't have an apps folder in there.
i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/ and
so will be able to edit that file to include document_analyser in the
list of installed apps, but can't find a /mymayanroot/apps folder.
will the migrations step shown on the git page be the same for a docker
install? eg: mayan-edms.py migrate ? i suppose i would execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?
thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result in an
generic way (similar to metadata structure). At the moment there are two
"analyzers" implemented. One which reads the exif data and one where you
can configure regular expressions which are used to parse the ocr result
of an document.
If you are able to write an regular expression to parse the invoice
number (be aware that the ocr qualtity is very important !) you can use
the extension to store the invoice number in a metadata like structure.
You can also configure an mayan index on it.
br Matthias
if i crack this or hear from anyone at mayan i'll be sure to let you know.
i'm in the same boat, if i can be sure mayan is going to work for us a
support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as well. We
really see the potential here, but help doesn't come quick. I'm
interested in getting a support plan, but not if the support speed
doesn't increase.
Roberto has answered a few questions, but it's more of a waiting game
really. I posted some issues in the GitLab repository, but nothing
on that yet either. Let me known if you figure it out, we are
looking into the same thing.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Douglas Van Es
2017-06-21 19:42:14 UTC
Permalink
yes any tips on installing an extension into a mayan docker container
roberto?

thanks again matthias! i really appreciate the help

doug
Post by Matthias Löblich
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer, but if I
https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/Dockerfile
It is using ubuntu:16.04 image and installing mayan by "RUN pip install
mayan-edms==2.3". So I guess mayan will be installed in sitepackages.
~$ lsb_release -a No LSB modules are available.
16.04 Codename: xenial
~$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or
"license" for more information.
Post by Douglas Van Es
Post by Matthias Löblich
import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
But this might be a good question for Roberto: How to integrate an
Extension in to the mayan docker image.
br Matthias
2017-06-15 19:18 GMT+02:00 Douglas Van Es
Post by Douglas Van Es
wow thank you matthias, this looks like it may work for me.
i have a couple of questions based on the docs at the github site, and
am wondering if you could help me out with them. what would my mayan
root folder be on an install using docker? i've looked around
/var/lib/docker and can't quite figure out the correct place to create
a link to document_analyzer...
/var/lib/docker/aufs/mnt/HASHEDNAME/usr/ local/bin/ ? i don't have an
apps folder in there.
i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/
and so will be able to edit that file to include document_analyser in
the list of installed apps, but can't find a /mymayanroot/apps folder.
will the migrations step shown on the git page be the same for a docker
install? eg: mayan-edms.py migrate ? i suppose i would execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?
thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result in an
generic way (similar to metadata structure). At the moment there are
two "analyzers" implemented. One which reads the exif data and one
where you can configure regular expressions which are used to parse
the ocr result of an document.
If you are able to write an regular expression to parse the invoice
number (be aware that the ocr qualtity is very important !) you can
use the extension to store the invoice number in a metadata like
structure. You can also configure an mayan index on it.
br Matthias
if i crack this or hear from anyone at mayan i'll be sure to let you know.
i'm in the same boat, if i can be sure mayan is going to work for us
a support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as well.
We really see the potential here, but help doesn't come quick.
I'm interested in getting a support plan, but not if the support
speed doesn't increase.
Roberto has answered a few questions, but it's more of a waiting
game really. I posted some issues in the GitLab repository, but
nothing on that yet either. Let me known if you figure it out, we
are looking into the same thing.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
mayan-edms+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Roberto Rosario
2017-06-21 19:52:24 UTC
Permalink
Check the section "Customizing the image"
here: https://hub.docker.com/r/mayanedms/mayanedms/

It is not the easiest thing to do but it is the way Docker images are
officially customized.

However, after the next version, I plan to work on finding ways to
customize the image without having to rebuild a new image.
One idea I want to try is providing an environment variable called
MAYAN_PIP_PACKAGES or similar that contains
a comma delimited list of packages to download and install from the web.
The disadvantage of this approach is that
the installed packages are not persistent and need to be downloaded and
installed every time the image starts.

Also planning on trying something like MAYAN_APT_PACKAGES too to allow
installing Ubuntu packages like extra
OCR language packs at runtime.

Docker provides a command called "commit" which could be the answer to the
non persistent issue.

These are all untested ideas at the moment and for now the only official
way to customize an image is the one provided in the link above.
Post by Douglas Van Es
yes any tips on installing an extension into a mayan docker container
roberto?
thanks again matthias! i really appreciate the help
doug
Post by Matthias Löblich
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer, but if I
https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/Dockerfile
It is using ubuntu:16.04 image and installing mayan by "RUN pip install
mayan-edms==2.3". So I guess mayan will be installed in sitepackages.
~$ lsb_release -a No LSB modules are available.
16.04 Codename: xenial
~$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or
"license" for more information.
Post by Douglas Van Es
Post by Matthias Löblich
import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
But this might be a good question for Roberto: How to integrate an
Extension in to the mayan docker image.
br Matthias
2017-06-15 19:18 GMT+02:00 Douglas Van Es
Post by Douglas Van Es
wow thank you matthias, this looks like it may work for me.
i have a couple of questions based on the docs at the github site, and
am wondering if you could help me out with them. what would my mayan
root folder be on an install using docker? i've looked around
/var/lib/docker and can't quite figure out the correct place to create
a link to document_analyzer...
/var/lib/docker/aufs/mnt/HASHEDNAME/usr/ local/bin/ ? i don't have an
apps folder in there.
i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/
and so will be able to edit that file to include document_analyser in
the list of installed apps, but can't find a /mymayanroot/apps folder.
will the migrations step shown on the git page be the same for a docker
install? eg: mayan-edms.py migrate ? i suppose i would execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?
thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result in an
generic way (similar to metadata structure). At the moment there are
two "analyzers" implemented. One which reads the exif data and one
where you can configure regular expressions which are used to parse
the ocr result of an document.
If you are able to write an regular expression to parse the invoice
number (be aware that the ocr qualtity is very important !) you can
use the extension to store the invoice number in a metadata like
structure. You can also configure an mayan index on it.
br Matthias
if i crack this or hear from anyone at mayan i'll be sure to let you know.
i'm in the same boat, if i can be sure mayan is going to work for us
a support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as well.
We really see the potential here, but help doesn't come quick.
I'm interested in getting a support plan, but not if the support
speed doesn't increase.
Roberto has answered a few questions, but it's more of a waiting
game really. I posted some issues in the GitLab repository, but
nothing on that yet either. Let me known if you figure it out, we
are looking into the same thing.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
mayan-edms+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Douglas Van Es
2017-06-23 13:37:09 UTC
Permalink
thank you for the response roberto. great work on mayan, it looks like an
amazing tool. i think it will fill my organization's requirements for edm
rather well, if i can pull an invoice name out of the scanned documents
using OCR and then name the document using the invoice name, or at a
minimum populate a metadata field.

it seems like matthias has created an extension that will fit the bill
for my OCR needs, but i am having a little difficulty finding my way
around the docker container's environment.

do i need to customize the image in this case? or can i just install
matthias' document_analyzer extension by placing it in mayan's root
folder?

i just need to know what paths to use in the instructions on the
Installation
clone the sources from gitlab to you local env.
cd /yourmayanroot/apps
ln -s /yourgitroot/document_analyzer/document_analyzer/ .
In your settings/local.py file add document_analyzer to your
INSTALLED_APPS += (
'document_analyzer',
)
mayan-edms.py migrate
i'm pretty sure local.py sits in /var/lib/docker/volumes/mayan_settings/
_data/ and that i can make the mentioned changes there.

it's figuring out what to substitute for "yourmayanroot" that has me
stumped. i don't have an apps folder in /var/lib/docker/volumes/
mayan_settings/_data/

problem is there are duplicates of these files and folders sprinkled
around the image: in hashed folders at /var/lib/docker/aufs/mnt and so on.

thanks again for your time!

doug van es
Check the section "Customizing the image"
here: https://hub.docker.com/r/mayanedms/mayanedms/
It is not the easiest thing to do but it is the way Docker images are
officially customized.
However, after the next version, I plan to work on finding ways to
customize the image without having to rebuild a new image.
One idea I want to try is providing an environment variable called
MAYAN_PIP_PACKAGES or similar that contains a comma delimited list of
packages to download and install from the web. The disadvantage of this
approach is that the installed packages are not persistent and need to
be downloaded and installed every time the image starts.
Also planning on trying something like MAYAN_APT_PACKAGES too to allow
installing Ubuntu packages like extra OCR language packs at runtime.
Docker provides a command called "commit" which could be the answer to
the non persistent issue.
These are all untested ideas at the moment and for now the only official
way to customize an image is the one provided in the link above.
Post by Douglas Van Es
yes any tips on installing an extension into a mayan docker container
roberto?
thanks again matthias! i really appreciate the help
doug
Post by Matthias Löblich
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer, but
https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/
Dockerfile
Post by Douglas Van Es
Post by Matthias Löblich
It is using ubuntu:16.04 image and installing mayan by "RUN pip
install mayan-edms==2.3". So I guess mayan will be installed in
sitepackages.
~$ lsb_release -a No LSB modules are available.
16.04 Codename: xenial
~$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or
"license" for more information.
Post by Douglas Van Es
Post by Matthias Löblich
import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
But this might be a good question for Roberto: How to integrate an
Extension in to the mayan docker image.
br Matthias
2017-06-15 19:18 GMT+02:00 Douglas Van Es
Post by Douglas Van Es
wow thank you matthias, this looks like it may work for me.
i have a couple of questions based on the docs at the github site,
and am wondering if you could help me out with them. what would my
mayan root folder be on an install using docker? i've looked around
/var/lib/docker and can't quite figure out the correct place to
create a link to document_analyzer...
/var/lib/docker/aufs/mnt/HASHEDNAME/usr/ local/bin/ ? i don't have
an apps folder in there.
i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/
and so will be able to edit that file to include document_analyser
in the list of installed apps, but can't find a /mymayanroot/apps
folder.
will the migrations step shown on the git page be the same for a
docker install? eg: mayan-edms.py migrate ? i suppose i would
execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?
thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result in
an generic way (similar to metadata structure). At the moment
there are two "analyzers" implemented. One which reads the exif
data and one where you can configure regular expressions which are
used to parse the ocr result of an document.
If you are able to write an regular expression to parse the
invoice number (be aware that the ocr qualtity is very important
!) you can use the extension to store the invoice number in a
metadata like structure. You can also configure an mayan index on
it.
br Matthias
if i crack this or hear from anyone at mayan i'll be sure to let you know.
i'm in the same boat, if i can be sure mayan is going to work for
us a support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as
well. We really see the potential here, but help doesn't come
quick. I'm interested in getting a support plan, but not if the
support speed doesn't increase.
Roberto has answered a few questions, but it's more of a
waiting game really. I posted some issues in the GitLab
repository, but nothing on that yet either. Let me known if
you figure it out, we are looking into the same thing.
--
---
You received this message because you are subscribed to a topic in
the Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
mayan-edms+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Morgan Boyd
2018-01-06 18:30:53 UTC
Permalink
Hey Doug - I managed to get this configured with my Docker container.

Here's how I did it.

1: created a docker volume for "dms_apps"
2: mounted volume to Mayan container to a local path of /etc/mayan (my
version is configured with the MySQL option, so this dir was empty)
3: created the symlink to /usr/local/lib/python2.7/dist-packages/mayan/apps
4: Ran mayan-edms.py migrate
5: Restart container (may take a bit longer to initialize)

After the restart, the Analyzers option is available within Setup.

Noong Biyernes, Hunyo 23, 2017 ng 8:37:25 AM UTC-5, si Douglas Van Es ay
Post by Douglas Van Es
thank you for the response roberto. great work on mayan, it looks like an
amazing tool. i think it will fill my organization's requirements for edm
rather well, if i can pull an invoice name out of the scanned documents
using OCR and then name the document using the invoice name, or at a
minimum populate a metadata field.
it seems like matthias has created an extension that will fit the bill
for my OCR needs, but i am having a little difficulty finding my way
around the docker container's environment.
do i need to customize the image in this case? or can i just install
matthias' document_analyzer extension by placing it in mayan's root
folder?
i just need to know what paths to use in the instructions on the
Installation
clone the sources from gitlab to you local env.
cd /yourmayanroot/apps
ln -s /yourgitroot/document_analyzer/document_analyzer/ .
In your settings/local.py file add document_analyzer to your
INSTALLED_APPS += (
'document_analyzer',
)
mayan-edms.py migrate
i'm pretty sure local.py sits in /var/lib/docker/volumes/mayan_settings/
_data/ and that i can make the mentioned changes there.
it's figuring out what to substitute for "yourmayanroot" that has me
stumped. i don't have an apps folder in /var/lib/docker/volumes/
mayan_settings/_data/
problem is there are duplicates of these files and folders sprinkled
around the image: in hashed folders at /var/lib/docker/aufs/mnt and so on.
thanks again for your time!
doug van es
Check the section "Customizing the image"
here: https://hub.docker.com/r/mayanedms/mayanedms/
It is not the easiest thing to do but it is the way Docker images are
officially customized.
However, after the next version, I plan to work on finding ways to
customize the image without having to rebuild a new image.
One idea I want to try is providing an environment variable called
MAYAN_PIP_PACKAGES or similar that contains a comma delimited list of
packages to download and install from the web. The disadvantage of this
approach is that the installed packages are not persistent and need to
be downloaded and installed every time the image starts.
Also planning on trying something like MAYAN_APT_PACKAGES too to allow
installing Ubuntu packages like extra OCR language packs at runtime.
Docker provides a command called "commit" which could be the answer to
the non persistent issue.
These are all untested ideas at the moment and for now the only official
way to customize an image is the one provided in the link above.
Post by Douglas Van Es
yes any tips on installing an extension into a mayan docker container
roberto?
thanks again matthias! i really appreciate the help
doug
Post by Matthias Löblich
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer, but
https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/
Dockerfile
Post by Douglas Van Es
Post by Matthias Löblich
It is using ubuntu:16.04 image and installing mayan by "RUN pip
install mayan-edms==2.3". So I guess mayan will be installed in
sitepackages.
~$ lsb_release -a No LSB modules are available.
16.04 Codename: xenial
~$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits" or
"license" for more information.
Post by Douglas Van Es
Post by Matthias Löblich
import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
But this might be a good question for Roberto: How to integrate an
Extension in to the mayan docker image.
br Matthias
2017-06-15 19:18 GMT+02:00 Douglas Van Es
Post by Douglas Van Es
wow thank you matthias, this looks like it may work for me.
i have a couple of questions based on the docs at the github site,
and am wondering if you could help me out with them. what would my
mayan root folder be on an install using docker? i've looked around
/var/lib/docker and can't quite figure out the correct place to
create a link to document_analyzer...
/var/lib/docker/aufs/mnt/HASHEDNAME/usr/ local/bin/ ? i don't have
an apps folder in there.
i've found local.py in /var/lib/docker/volumes/mayan_settings/_data/
and so will be able to edit that file to include document_analyser
in the list of installed apps, but can't find a /mymayanroot/apps
folder.
will the migrations step shown on the git page be the same for a
docker install? eg: mayan-edms.py migrate ? i suppose i would
execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?
thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result in
an generic way (similar to metadata structure). At the moment
there are two "analyzers" implemented. One which reads the exif
data and one where you can configure regular expressions which are
used to parse the ocr result of an document.
If you are able to write an regular expression to parse the
invoice number (be aware that the ocr qualtity is very important
!) you can use the extension to store the invoice number in a
metadata like structure. You can also configure an mayan index on
it.
br Matthias
if i crack this or hear from anyone at mayan i'll be sure to let
you know.
i'm in the same boat, if i can be sure mayan is going to work for
us a support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as
well. We really see the potential here, but help doesn't come
quick. I'm interested in getting a support plan, but not if the
support speed doesn't increase.
Roberto has answered a few questions, but it's more of a
waiting game really. I posted some issues in the GitLab
repository, but nothing on that yet either. Let me known if
you figure it out, we are looking into the same thing.
--
---
You received this message because you are subscribed to a topic in
the Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
mayan-edms+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Douglas Van Es
2018-01-16 17:13:33 UTC
Permalink
thanks for the little roadmap morgan!

it'll take a little more reading on my part, but i'll give it a go when i
get a little more free time!
Post by Morgan Boyd
Hey Doug - I managed to get this configured with my Docker container.
Here's how I did it.
1: created a docker volume for "dms_apps"
2: mounted volume to Mayan container to a local path of /etc/mayan (my
version is configured with the MySQL option, so this dir was empty)
3: created the symlink to
/usr/local/lib/python2.7/dist-packages/mayan/apps 4: Ran mayan-edms.py
migrate 5: Restart container (may take a bit longer to initialize)
After the restart, the Analyzers option is available within Setup.
Noong Biyernes, Hunyo 23, 2017 ng 8:37:25 AM UTC-5, si Douglas Van Es ay
Post by Douglas Van Es
thank you for the response roberto. great work on mayan, it looks like
an amazing tool. i think it will fill my organization's requirements
for edm rather well, if i can pull an invoice name out of the scanned
documents using OCR and then name the document using the invoice name,
or at a minimum populate a metadata field.
it seems like matthias has created an extension that will fit the bill
for my OCR needs, but i am having a little difficulty finding my way
around the docker container's environment.
do i need to customize the image in this case? or can i just install
matthias' document_analyzer extension by placing it in mayan's root
folder?
i just need to know what paths to use in the instructions on the
Installation
clone the sources from gitlab to you local env.
cd /yourmayanroot/apps ln -s
/yourgitroot/document_analyzer/document_analyzer/ .
In your settings/local.py file add document_analyzer to your
INSTALLED_APPS += (
'document_analyzer',
)
mayan-edms.py migrate
i'm pretty sure local.py sits in
/var/lib/docker/volumes/mayan_settings/ _data/ and that i can make the
mentioned changes there.
it's figuring out what to substitute for "yourmayanroot" that has me
stumped. i don't have an apps folder in /var/lib/docker/volumes/
mayan_settings/_data/
problem is there are duplicates of these files and folders sprinkled
around the image: in hashed folders at /var/lib/docker/aufs/mnt and so on.
thanks again for your time!
doug van es
Check the section "Customizing the image"
here: https://hub.docker.com/r/mayanedms/mayanedms/
It is not the easiest thing to do but it is the way Docker images are
officially customized.
However, after the next version, I plan to work on finding ways to
customize the image without having to rebuild a new image.
One idea I want to try is providing an environment variable called
MAYAN_PIP_PACKAGES or similar that contains a comma delimited list of
packages to download and install from the web. The disadvantage of
this approach is that the installed packages are not persistent and
need to be downloaded and installed every time the image starts.
Also planning on trying something like MAYAN_APT_PACKAGES too to
allow installing Ubuntu packages like extra OCR language packs at
runtime.
Docker provides a command called "commit" which could be the answer
to the non persistent issue.
These are all untested ideas at the moment and for now the only
official way to customize an image is the one provided in the link
above.
Post by Douglas Van Es
yes any tips on installing an extension into a mayan docker
container roberto?
thanks again matthias! i really appreciate the help
doug
Post by Matthias Löblich
Hi Douglas,
I have not done any stuff on Docker with the document_analyzer,
https://gitlab.com/mayan-edms/mayan-edms-docker/blob/master/
Dockerfile
Post by Douglas Van Es
Post by Matthias Löblich
It is using ubuntu:16.04 image and installing mayan by "RUN pip
install mayan-edms==2.3". So I guess mayan will be installed in
sitepackages.
~$ lsb_release -a No LSB modules are available.
16.04 Codename: xenial
~$ python Python 2.7.12 (default, Nov 19 2016, 06:48:10)
[GCC 5.4.0 20160609] on linux2 Type "help", "copyright", "credits"
or "license" for more information.
Post by Douglas Van Es
Post by Matthias Löblich
import site; site.getsitepackages()
['/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']
But this might be a good question for Roberto: How to integrate an
Extension in to the mayan docker image.
br Matthias
2017-06-15 19:18 GMT+02:00 Douglas Van Es
Post by Douglas Van Es
wow thank you matthias, this looks like it may work for me.
i have a couple of questions based on the docs at the github
site, and am wondering if you could help me out with them. what
would my mayan root folder be on an install using docker? i've
looked around /var/lib/docker and can't quite figure out the
correct place to create a link to document_analyzer...
/var/lib/docker/aufs/mnt/HASHEDNAME/usr/ local/bin/ ? i don't
have an apps folder in there.
i've found local.py in
/var/lib/docker/volumes/mayan_settings/_data/
and so will be able to edit that file to include
document_analyser in the list of installed apps, but can't find a
/mymayanroot/apps folder.
will the migrations step shown on the git page be the same for a
docker install? eg: mayan-edms.py migrate ? i suppose i would
execute that from /
var/lib/docker/aufs/mnt/HASHEDNAME/usr/local/bin/ right?
thank you for the help so far!
Post by Matthias Löblich
Hi,
it did an Extension for mayan called /document_analyzer
https://gitlab.com/mayan-edms/document_analyzer
The idea behind is to analyze a document and store the result
in an generic way (similar to metadata structure). At the
moment there are two "analyzers" implemented. One which reads
the exif data and one where you can configure regular
expressions which are used to parse the ocr result of an
document.
If you are able to write an regular expression to parse the
invoice number (be aware that the ocr qualtity is very
important !) you can use the extension to store the invoice
number in a metadata like structure. You can also configure an
mayan index on it.
br Matthias
if i crack this or hear from anyone at mayan i'll be sure to
let you know.
i'm in the same boat, if i can be sure mayan is going to work
for us a support plan is in our future as well.
Post by David Kornahrens
I'm currently trying to walk myself through the program as
well. We really see the potential here, but help doesn't
come quick. I'm interested in getting a support plan, but
not if the support speed doesn't increase.
Roberto has answered a few questions, but it's more of a
waiting game really. I posted some issues in the GitLab
repository, but nothing on that yet either. Let me known if
you figure it out, we are looking into the same thing.
--
---
You received this message because you are subscribed to a topic
in the Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/
topic/mayan-edms/6P1AqlvNjWQ/unsubscribe.
To unsubscribe from this group and all its topics, send an email
to mayan-edms+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...