Discussion:
[Mayan EDMS: 2271] Metadata extractor?
RW Shore
2018-02-24 17:47:13 UTC
Permalink
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
l***@gmail.com
2018-02-28 00:21:33 UTC
Permalink
Found this searching the web. Seems to be exactly what you are looking for.

https://pypi.python.org/pypi/mayan-exif
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
RW Shore
2018-02-28 18:08:24 UTC
Permalink
Thanks. Feel stupid for not finding it myself
Post by l***@gmail.com
Found this searching the web. Seems to be exactly what you are looking for.
https://pypi.python.org/pypi/mayan-exif
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
l***@gmail.com
2018-03-01 01:03:53 UTC
Permalink
Cheers mate it happened to me too :) There is a veritable wealth of
expansions, plugins and apps written for Mayan, but they are all over the
web. Would be nice to have all that in a single place, like an app store.
Post by RW Shore
Thanks. Feel stupid for not finding it myself
Post by l***@gmail.com
Found this searching the web. Seems to be exactly what you are looking for.
https://pypi.python.org/pypi/mayan-exif
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2018-03-02 11:10:58 UTC
Permalink
You can also use https://gitlab.com/mayan-edms/document_analyzer, which
includes the exif functionality.

br
Matthias
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
RW Shore
2018-03-02 17:01:15 UTC
Permalink
thank you for the suggestion. Unfortunately I can't get the
document_analyzer app to install. My situation is the following:

* I'm starting with the docker container -- not the NG one (yet), but the
mayanedms/mayanedms:latest one. The image is running as a swarm-based
service

* Docker file is attached. The only change from the installation
instructions given in the gitlab README is to copy the document_analyzer
code into the mayan/apps directory. I did this
rather than a symlink only because I was uncertain if the symlink was my
problem.

* This local.py works fine:
from __future__ import absolute_import, unicode_literals

from .base import *

SECRET_KEY = 'iluml=7+***@ild8e%o*16b2a1=+m!ks9^o%5u54b&=2gh%'

EMAIL_HOST = 'smtp.gmail.com'
EMAIL_PORT = 587
EMAIL_HOST_USER = '***@shore.support'
EMAIL_HOST_PASSWORD = 'RWS.$oct10'
EMAIL_USE_TLS = True
# INSTALLED_APPS += (
# 'document_analyzer',
# )

* By "works fine" I mean I can login as admin, see the document types (only
1), upload a zip file of JPEGs, ...

* As soon as I remove the comments around INSTALLED_APPS, nothing works. If
I cycle the service, it refuses to come up. If I change the local.py file
while the app is running and execute "mayan-edms.py migrate, I get a stack
trace (attached) which ends with the following:
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 237, in get_containing_app_config
self.check_apps_ready()
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 124, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.

* I've verified that exiftool is installed and appears to run properly,
though the only thing I actually did was "exiftool -ver".

Any suggestions?
Post by Matthias Löblich
You can also use https://gitlab.com/mayan-edms/document_analyzer, which
includes the exif functionality.
br
Matthias
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2018-03-05 16:40:52 UTC
Permalink
Hi,
please try to use the document_analyzer version from my repository:

https://gitlab.com/startmat/document_analyzer

br
Matthias
Post by RW Shore
thank you for the suggestion. Unfortunately I can't get the
* I'm starting with the docker container -- not the NG one (yet), but the
mayanedms/mayanedms:latest one. The image is running as a swarm-based
service
* Docker file is attached. The only change from the installation
instructions given in the gitlab README is to copy the document_analyzer
code into the mayan/apps directory. I did this
rather than a symlink only because I was uncertain if the symlink was my
problem.
from __future__ import absolute_import, unicode_literals
from .base import *
EMAIL_HOST = 'smtp.gmail.com'
EMAIL_PORT = 587
EMAIL_HOST_PASSWORD = 'RWS.$oct10'
EMAIL_USE_TLS = True
# INSTALLED_APPS += (
# 'document_analyzer',
# )
* By "works fine" I mean I can login as admin, see the document types
(only 1), upload a zip file of JPEGs, ...
* As soon as I remove the comments around INSTALLED_APPS, nothing works.
If I cycle the service, it refuses to come up. If I change the local.py
file while the app is running and execute "mayan-edms.py migrate, I get a
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 237, in get_containing_app_config
self.check_apps_ready()
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 124, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
* I've verified that exiftool is installed and appears to run properly,
though the only thing I actually did was "exiftool -ver".
Any suggestions?
Post by Matthias Löblich
You can also use https://gitlab.com/mayan-edms/document_analyzer, which
includes the exif functionality.
br
Matthias
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
RW Shore
2018-03-06 10:44:29 UTC
Permalink
Thank you for the reply. By using the analyzer from your repository, I was
able to extend the docker container and get the service running. My next
questions involve setup. I assume that I need to create a new analyzer.
When I bring up the "create analyzer" panel, I see "GetExifData" as one of
the analyzers in the drop-down at the bottom of the panel. I suppose that I
just make up names for the label and the slug. However, I can't create a
new analyzer without putting something in the "Parameters" field.

What should the Parameters field contain for an EXIF analyzer? Also, do I
need to pre-define metadata types for the extracted EXIF information, or
are the types created automagically?
Post by Matthias Löblich
Hi,
https://gitlab.com/startmat/document_analyzer
br
Matthias
Post by RW Shore
thank you for the suggestion. Unfortunately I can't get the
* I'm starting with the docker container -- not the NG one (yet), but the
mayanedms/mayanedms:latest one. The image is running as a swarm-based
service
* Docker file is attached. The only change from the installation
instructions given in the gitlab README is to copy the document_analyzer
code into the mayan/apps directory. I did this
rather than a symlink only because I was uncertain if the symlink was my
problem.
from __future__ import absolute_import, unicode_literals
from .base import *
EMAIL_HOST = 'smtp.gmail.com'
EMAIL_PORT = 587
EMAIL_HOST_PASSWORD = 'RWS.$oct10'
EMAIL_USE_TLS = True
# INSTALLED_APPS += (
# 'document_analyzer',
# )
* By "works fine" I mean I can login as admin, see the document types
(only 1), upload a zip file of JPEGs, ...
* As soon as I remove the comments around INSTALLED_APPS, nothing works.
If I cycle the service, it refuses to come up. If I change the local.py
file while the app is running and execute "mayan-edms.py migrate, I get a
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 237, in get_containing_app_config
self.check_apps_ready()
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 124, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
* I've verified that exiftool is installed and appears to run properly,
though the only thing I actually did was "exiftool -ver".
Any suggestions?
Post by Matthias Löblich
You can also use https://gitlab.com/mayan-edms/document_analyzer, which
includes the exif functionality.
br
Matthias
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google
Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2018-03-06 19:07:09 UTC
Permalink
Hi,
just put the string None as parameter for the exif analyzer.

The result of the EXIF information will not be stored in the metadata.
There is separate data structure where there values are stored. You can
find it there:





The result page looks like that:





You are able to setup Mayan-Indexes based on the analyzer results similar
to the indexes based on metadata:

Menupath: System/Setup/Indexes -> Create index -> Save -> Tree Template:

Add a django template expression pointing to the name of the parameter of
the document_analyzer result.
e.g this expression is creating an Index based on the FileType (see red
square in the screen shot above) :

{{ document.analyzer_value_of.FileType }}




Hope that helps.

br
Matthias
Post by RW Shore
Thank you for the reply. By using the analyzer from your repository, I was
able to extend the docker container and get the service running. My next
questions involve setup. I assume that I need to create a new analyzer.
When I bring up the "create analyzer" panel, I see "GetExifData" as one of
the analyzers in the drop-down at the bottom of the panel. I suppose that I
just make up names for the label and the slug. However, I can't create a
new analyzer without putting something in the "Parameters" field.
What should the Parameters field contain for an EXIF analyzer? Also, do I
need to pre-define metadata types for the extracted EXIF information, or
are the types created automagically?
Post by Matthias Löblich
Hi,
https://gitlab.com/startmat/document_analyzer
br
Matthias
Post by RW Shore
thank you for the suggestion. Unfortunately I can't get the
* I'm starting with the docker container -- not the NG one (yet), but
the mayanedms/mayanedms:latest one. The image is running as a swarm-based
service
* Docker file is attached. The only change from the installation
instructions given in the gitlab README is to copy the document_analyzer
code into the mayan/apps directory. I did this
rather than a symlink only because I was uncertain if the symlink was my
problem.
from __future__ import absolute_import, unicode_literals
from .base import *
EMAIL_HOST = 'smtp.gmail.com'
EMAIL_PORT = 587
EMAIL_HOST_PASSWORD = 'RWS.$oct10'
EMAIL_USE_TLS = True
# INSTALLED_APPS += (
# 'document_analyzer',
# )
* By "works fine" I mean I can login as admin, see the document types
(only 1), upload a zip file of JPEGs, ...
* As soon as I remove the comments around INSTALLED_APPS, nothing works.
If I cycle the service, it refuses to come up. If I change the local.py
file while the app is running and execute "mayan-edms.py migrate, I get a
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 237, in get_containing_app_config
self.check_apps_ready()
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 124, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
* I've verified that exiftool is installed and appears to run properly,
though the only thing I actually did was "exiftool -ver".
Any suggestions?
Post by Matthias Löblich
You can also use https://gitlab.com/mayan-edms/document_analyzer,
which includes the exif functionality.
br
Matthias
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google
Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/mayan-edms/ky5ReQIMzSg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
RW Shore
2018-03-06 20:33:37 UTC
Permalink
It helps, but still no joy. I changed the parameter from None to "None" and
submitted a document. I found the analyzer results, but nothing prints.
I've done the following:

a. cd /var/lib/mayan/document_storage && exiftool one-of-the-files -> a
listing of JPEG metadata. I conclude from this that I've got exiftool
installed properly. If it matters, the path to exiftool is
/usr/local/bin/exiftool, but it appears to be on the executable path, at
least when I exec /bin/bash in the container. If it matters, all the
processes in the docker container are running as root _except_ the nginx
workers, which run as www-data. This is the way the original container was
set up; I haven't messed with the mayan startup command as embedded in the
image mayanedms/mayanedms:latest. t

b. MariaDB [mayan]> select * from document_analyzer_analyzer;
+----+---------------+--------------+----------------------------------------------+-----------+
| id | label | slug |
type | parameter |
+----+---------------+--------------+----------------------------------------------+-----------+
| 1 | Exif Analyzer | exifAnalyzer |
document_analyzer.backends.exiftool.EXIFTool | "None" |
+----+---------------+--------------+----------------------------------------------+-----------+
1 row in set (0.00 sec)

c. MariaDB [mayan]> select * from document_analyzer_analyzer_document_types;
+----+-------------+-----------------+
| id | analyzer_id | documenttype_id |
+----+-------------+-----------------+
| 1 | 1 | 1 |
+----+-------------+-----------------+
1 row in set (0.00 sec)
(documentype #1 is Default)

d. MariaDB [mayan]> select * from document_analyzer_result;
Empty set (0.00 sec)
(this is after at least one document got submitted from the GUI)

* I don't see any events for the submitted document that say
document_analyzer started or finished, though I do get the pop-up when I
submit a document that says the document was inserted into the
document_analyzer queue. Is the lack of events expected?

* Are there any error logs that might give me a clue about what's going on?

* Any other thoughts or comments?
Post by Matthias Löblich
Hi,
just put the string None as parameter for the exif analyzer.
The result of the EXIF information will not be stored in the metadata.
There is separate data structure where there values are stored. You can
You are able to setup Mayan-Indexes based on the analyzer results similar
Add a django template expression pointing to the name of the parameter of
the document_analyzer result.
e.g this expression is creating an Index based on the FileType (see red
{{ document.analyzer_value_of.FileType }}
Hope that helps.
br
Matthias
Post by RW Shore
Thank you for the reply. By using the analyzer from your repository, I
was able to extend the docker container and get the service running. My
next questions involve setup. I assume that I need to create a new
analyzer. When I bring up the "create analyzer" panel, I see "GetExifData"
as one of the analyzers in the drop-down at the bottom of the panel. I
suppose that I just make up names for the label and the slug. However, I
can't create a new analyzer without putting something in the "Parameters"
field.
What should the Parameters field contain for an EXIF analyzer? Also, do I
need to pre-define metadata types for the extracted EXIF information, or
are the types created automagically?
Post by Matthias Löblich
Hi,
https://gitlab.com/startmat/document_analyzer
br
Matthias
Post by RW Shore
thank you for the suggestion. Unfortunately I can't get the
* I'm starting with the docker container -- not the NG one (yet), but
the mayanedms/mayanedms:latest one. The image is running as a swarm-based
service
* Docker file is attached. The only change from the installation
instructions given in the gitlab README is to copy the document_analyzer
code into the mayan/apps directory. I did this
rather than a symlink only because I was uncertain if the symlink was
my problem.
from __future__ import absolute_import, unicode_literals
from .base import *
EMAIL_HOST = 'smtp.gmail.com'
EMAIL_PORT = 587
EMAIL_HOST_PASSWORD = 'RWS.$oct10'
EMAIL_USE_TLS = True
# INSTALLED_APPS += (
# 'document_analyzer',
# )
* By "works fine" I mean I can login as admin, see the document types
(only 1), upload a zip file of JPEGs, ...
* As soon as I remove the comments around INSTALLED_APPS, nothing
works. If I cycle the service, it refuses to come up. If I change the
local.py file while the app is running and execute "mayan-edms.py migrate,
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 237, in get_containing_app_config
self.check_apps_ready()
File "/usr/local/lib/python2.7/dist-packages/django/apps/registry.py",
line 124, in check_apps_ready
raise AppRegistryNotReady("Apps aren't loaded yet.")
django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
* I've verified that exiftool is installed and appears to run properly,
though the only thing I actually did was "exiftool -ver".
Any suggestions?
Post by Matthias Löblich
You can also use https://gitlab.com/mayan-edms/document_analyzer,
which includes the exif functionality.
br
Matthias
Post by RW Shore
Does anyone have a transformer that maps the embedded JPEG metadata
(date/time taken, size, ...) into Mayan-EDMS metadata? Is such a
transformation possible?
--
---
You received this message because you are subscribed to the Google
Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google
Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to a topic in the
Google Groups "Mayan EDMS" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/mayan-edms/ky5ReQIMzSg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...