Discussion:
[Mayan EDMS: 2470] Watch Folder and Scanner issue
Eddi
2018-05-24 14:44:13 UTC
Permalink
Hi,

I am using Mayan 2.7.3. I have a watch folder setup, where a scanner saves
files to.

When the scanner starts scanning, it will create a file in the folder. When
watching the folder via CLI while the scanner is scanning, I can see the
file grow in size.

The issue comes, when the scanner is scanning, and the Mayan watch function
checks the folder. The file is present (but not completed because the
scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan it
shows up as a red question mark.

How can I work around this?

Cheers,

Eddi
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jesaja Everling
2018-05-24 17:19:43 UTC
Permalink
Hi Eddi,

I'm not familiar with the code that handles the watchfolder monitoring, and
it has been a while that I worked with watchfolders myself. I think you
have probably two options - make sure the watcher acts on the appropriate
event, e.g. IN_CLOSE_WRITE in
http://seb.dbzteam.org/pyinotify/pyinotify-module.html#IN_CLOSE_WRITE, or
you could use a spin-loop to check if the file is still growing and only
process it further when it has stopped growing for some time (which sounds
less reliable then relying on pyinotify or watchdog or a similar solution).

Don't know how much that helps, hope it does.

Best Regards,

Jesaja Everling
Post by Eddi
Hi,
I am using Mayan 2.7.3. I have a watch folder setup, where a scanner saves
files to.
When the scanner starts scanning, it will create a file in the folder.
When watching the folder via CLI while the scanner is scanning, I can see
the file grow in size.
The issue comes, when the scanner is scanning, and the Mayan watch
function checks the folder. The file is present (but not completed because
the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan
it shows up as a red question mark.
How can I work around this?
Cheers,
Eddi
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jesaja Everling
2018-05-24 17:30:19 UTC
Permalink
I had a very cursory glance at the source code, and it seems like the
WatchfolderSource just checks if a file exists (using a periodic task in
Celery if I'm not mistaken), and processes it if it does. So I think that
could be improved.

@Roberto, do you think it would be worth it to have a standalone
watchfolder daemon that can be configured to push files to Mayan e.g. using
the API, which means it could also be run on a local machine and push to a
remote Mayan instance, or do you prefer to keep it a Celery task? If it's a
celery task, I don't think you can use some file watcher like pyinotify
(which requires a continuously running process which is not really what
Celery is intended for), but you could still have the task wait for the
files it encounters to stop growing before processing them further.
Post by Jesaja Everling
Hi Eddi,
I'm not familiar with the code that handles the watchfolder monitoring,
and it has been a while that I worked with watchfolders myself. I think you
have probably two options - make sure the watcher acts on the appropriate
event, e.g. IN_CLOSE_WRITE in http://seb.dbzteam.org/
pyinotify/pyinotify-module.html#IN_CLOSE_WRITE, or you could use a
spin-loop to check if the file is still growing and only process it further
when it has stopped growing for some time (which sounds less reliable then
relying on pyinotify or watchdog or a similar solution).
Don't know how much that helps, hope it does.
Best Regards,
Jesaja Everling
Post by Eddi
Hi,
I am using Mayan 2.7.3. I have a watch folder setup, where a scanner
saves files to.
When the scanner starts scanning, it will create a file in the folder.
When watching the folder via CLI while the scanner is scanning, I can see
the file grow in size.
The issue comes, when the scanner is scanning, and the Mayan watch
function checks the folder. The file is present (but not completed because
the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan
it shows up as a red question mark.
How can I work around this?
Cheers,
Eddi
--
---
You received this message because you are subscribed to the Google Groups
"Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Roberto Rosario
2018-05-30 17:43:03 UTC
Permalink
We've tried pyinotify in the past and as you've mentioned has its own set
of challenges. The proposed idea for the new watchfolder feature is to make
it a two pass process. A process grabs a lock to scan the watchfolder in an
exclusive manner. Scans the folder and writes to the database the file path
(or hash, we are still debating this) and the current size of the files.
Releases the lock. Then on the next period the same process or another
process grabs a lock and scans the watchfolder. If a file in the folder
appears in the database and has the same size, it is assumed to be complete
and uploaded and deleted. If the size doesn't match it is assumed the file
is still being created and is left alone. If the file is not the in the
database it is assumed to be a new file, thus added to the database with
this properties and left alone until the next pass. This proposal fixes the
problem of not knowing when the file has finished being created while being
platform agnostic and not depending on third party libraries. We wanted to
add this for version 3.0 but the changes were too many and the
specification still needs finalizing but I will be added in a future minor
version.
Post by Jesaja Everling
I had a very cursory glance at the source code, and it seems like the
WatchfolderSource just checks if a file exists (using a periodic task in
Celery if I'm not mistaken), and processes it if it does. So I think that
could be improved.
@Roberto, do you think it would be worth it to have a standalone
watchfolder daemon that can be configured to push files to Mayan e.g. using
the API, which means it could also be run on a local machine and push to a
remote Mayan instance, or do you prefer to keep it a Celery task? If it's a
celery task, I don't think you can use some file watcher like pyinotify
(which requires a continuously running process which is not really what
Celery is intended for), but you could still have the task wait for the
files it encounters to stop growing before processing them further.
Post by Jesaja Everling
Hi Eddi,
I'm not familiar with the code that handles the watchfolder monitoring,
and it has been a while that I worked with watchfolders myself. I think you
have probably two options - make sure the watcher acts on the appropriate
event, e.g. IN_CLOSE_WRITE in
http://seb.dbzteam.org/pyinotify/pyinotify-module.html#IN_CLOSE_WRITE,
or you could use a spin-loop to check if the file is still growing and only
process it further when it has stopped growing for some time (which sounds
less reliable then relying on pyinotify or watchdog or a similar solution).
Don't know how much that helps, hope it does.
Best Regards,
Jesaja Everling
Post by Eddi
Hi,
I am using Mayan 2.7.3. I have a watch folder setup, where a scanner
saves files to.
When the scanner starts scanning, it will create a file in the folder.
When watching the folder via CLI while the scanner is scanning, I can see
the file grow in size.
The issue comes, when the scanner is scanning, and the Mayan watch
function checks the folder. The file is present (but not completed because
the scanner is still scanning), but Mayan EDMS grabs the file.
The end result is a corrupt file (I use mostly PDF extensions). In Mayan
it shows up as a red question mark.
How can I work around this?
Cheers,
Eddi
--
---
You received this message because you are subscribed to the Google
Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send
For more options, visit https://groups.google.com/d/optout.
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...