Discussion:
[Mayan EDMS: 1323] Bug when document scanned directly into watch folder
Bruno CAPELETO
2016-04-17 13:19:16 UTC
Permalink
Dear all,

I found a reproducible and blocking bug regarding the watch folder
functionality.

First, I sat up a watch folder that worked fine : when I moved a document
in it, it was handled as expected and entered Mayan EDMS with the correct
document type.

Then when I scanned a document directly into the watch folder, once it got
into Mayan as expected, the next time it entered it but it was corrupted.

I found the reason for that : the scanning instrument (a professional
Konica Minolta C220) start to write into the watch folder as soon as the
scanning process starts, and of course it takes some time to finish
(approx. 5 s for 2 pages, each 2 sided).
When Maya looks at the folder when the scanning is finished, not problem
and the document is handled as it should. But when Maya looks at the folder
during the scanning process, it does not wait and takes the beginning of
the document only : this document ends in Maya as a corrupted document.

This bug makes it simply impossible to scan document directly into Maya
EDMS, because one can never be sure the document arrived correctly : Maya
must wait until the document is full (size does not change anymore for
example).

Cheers,
Bruno
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Roberto Rosario
2016-04-26 22:00:13 UTC
Permalink
I would be hesitant to call this a bug since this a situation that can
happen with any two software sharing a file. Try it with the a graphic
programs like the Gimp while scanning and the same would happen.
The receiving software has no way of knowing that the file is not yet ready
for processing.
The only way I could think is to examine the file write handlers of a file
and make sure there is not an open one, but is a long shot from being a
perfect solution.
The use case of direct scanning to a watched folder is ideal so I'll take a
shot at trying to fix this situation.
Post by Bruno CAPELETO
Dear all,
I found a reproducible and blocking bug regarding the watch folder
functionality.
First, I sat up a watch folder that worked fine : when I moved a document
in it, it was handled as expected and entered Mayan EDMS with the correct
document type.
Then when I scanned a document directly into the watch folder, once it got
into Mayan as expected, the next time it entered it but it was corrupted.
I found the reason for that : the scanning instrument (a professional
Konica Minolta C220) start to write into the watch folder as soon as the
scanning process starts, and of course it takes some time to finish
(approx. 5 s for 2 pages, each 2 sided).
When Maya looks at the folder when the scanning is finished, not problem
and the document is handled as it should. But when Maya looks at the folder
during the scanning process, it does not wait and takes the beginning of
the document only : this document ends in Maya as a corrupted document.
This bug makes it simply impossible to scan document directly into Maya
EDMS, because one can never be sure the document arrived correctly : Maya
must wait until the document is full (size does not change anymore for
example).
Cheers,
Bruno
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Mike Rabas
2017-11-08 03:19:14 UTC
Permalink
I'm running on FreeBSD and have hacked up a solution using a shell script
and cronjob every minute.

The scanner will drop a file into the watch folder. The file is created
with permissions that are not readable by Mayan. The script will check for
files in the watch folder and if it finds them run `fuser`, a program that
lists all processes that have a given file open. If another process has
the file open, that means the scanner is still writing its contents. If
fuser says the file is good to go, the script creates a lock file, mkdir
should be good enough.

I do some processing with img2pdf and ocrmypdf to write an OCR layer to the
file. Finally, change the resulting file permissions so that Mayan can
read it and remove the lock file.

I would have to think Linux has a similar tool available to test if another
process has a handle to a file. But this will be OS specific
unfortunately. I don't think there is any way to get around that.
Post by Roberto Rosario
I would be hesitant to call this a bug since this a situation that can
happen with any two software sharing a file. Try it with the a graphic
programs like the Gimp while scanning and the same would happen.
The receiving software has no way of knowing that the file is not yet
ready for processing.
The only way I could think is to examine the file write handlers of a file
and make sure there is not an open one, but is a long shot from being a
perfect solution.
The use case of direct scanning to a watched folder is ideal so I'll take
a shot at trying to fix this situation.
Post by Bruno CAPELETO
Dear all,
I found a reproducible and blocking bug regarding the watch folder
functionality.
First, I sat up a watch folder that worked fine : when I moved a document
in it, it was handled as expected and entered Mayan EDMS with the correct
document type.
Then when I scanned a document directly into the watch folder, once it
got into Mayan as expected, the next time it entered it but it was
corrupted.
I found the reason for that : the scanning instrument (a professional
Konica Minolta C220) start to write into the watch folder as soon as the
scanning process starts, and of course it takes some time to finish
(approx. 5 s for 2 pages, each 2 sided).
When Maya looks at the folder when the scanning is finished, not problem
and the document is handled as it should. But when Maya looks at the folder
during the scanning process, it does not wait and takes the beginning of
the document only : this document ends in Maya as a corrupted document.
This bug makes it simply impossible to scan document directly into Maya
EDMS, because one can never be sure the document arrived correctly : Maya
must wait until the document is full (size does not change anymore for
example).
Cheers,
Bruno
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthias Löblich
2017-11-08 07:59:09 UTC
Permalink
I have a similar solution, but scanning into an tmp-folder (on an RASPI)
and then running a script to do preprocessing stuff (cropping, cleaning,
...). After preprocessing the script is moving the file to the watch
folder. This solution works without a problem. ... Look like moving the
file is "fast- enough" to do not have overlapping operations with the mayan
job.

br
Matthias
Post by Mike Rabas
I'm running on FreeBSD and have hacked up a solution using a shell script
and cronjob every minute.
The scanner will drop a file into the watch folder. The file is created
with permissions that are not readable by Mayan. The script will check for
files in the watch folder and if it finds them run `fuser`, a program that
lists all processes that have a given file open. If another process has
the file open, that means the scanner is still writing its contents. If
fuser says the file is good to go, the script creates a lock file, mkdir
should be good enough.
I do some processing with img2pdf and ocrmypdf to write an OCR layer to
the file. Finally, change the resulting file permissions so that Mayan can
read it and remove the lock file.
I would have to think Linux has a similar tool available to test if
another process has a handle to a file. But this will be OS specific
unfortunately. I don't think there is any way to get around that.
Post by Roberto Rosario
I would be hesitant to call this a bug since this a situation that can
happen with any two software sharing a file. Try it with the a graphic
programs like the Gimp while scanning and the same would happen.
The receiving software has no way of knowing that the file is not yet
ready for processing.
The only way I could think is to examine the file write handlers of a
file and make sure there is not an open one, but is a long shot from being
a perfect solution.
The use case of direct scanning to a watched folder is ideal so I'll take
a shot at trying to fix this situation.
Post by Bruno CAPELETO
Dear all,
I found a reproducible and blocking bug regarding the watch folder
functionality.
First, I sat up a watch folder that worked fine : when I moved a
document in it, it was handled as expected and entered Mayan EDMS with the
correct document type.
Then when I scanned a document directly into the watch folder, once it
got into Mayan as expected, the next time it entered it but it was
corrupted.
I found the reason for that : the scanning instrument (a professional
Konica Minolta C220) start to write into the watch folder as soon as the
scanning process starts, and of course it takes some time to finish
(approx. 5 s for 2 pages, each 2 sided).
When Maya looks at the folder when the scanning is finished, not problem
and the document is handled as it should. But when Maya looks at the folder
during the scanning process, it does not wait and takes the beginning of
the document only : this document ends in Maya as a corrupted document.
This bug makes it simply impossible to scan document directly into Maya
EDMS, because one can never be sure the document arrived correctly : Maya
must wait until the document is full (size does not change anymore for
example).
Cheers,
Bruno
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
C V
2017-11-11 10:23:58 UTC
Permalink
Il giorno mercoledì 8 novembre 2017 08:59:09 UTC+1, Matthias Löblich ha
Post by Matthias Löblich
I have a similar solution, but scanning into an tmp-folder (on an RASPI)
and then running a script to do preprocessing stuff (cropping, cleaning,
...). After preprocessing the script is moving the file to the watch
folder. This solution works without a problem. ... Look like moving the
file is "fast- enough" to do not have overlapping operations with the
mayan job.
br
Matthias
Post by Mike Rabas
I'm running on FreeBSD and have hacked up a solution using a shell script
and cronjob every minute.
The scanner will drop a file into the watch folder. The file is created
with permissions that are not readable by Mayan. The script will check for
files in the watch folder and if it finds them run `fuser`, a program that
lists all processes that have a given file open. If another process has
the file open, that means the scanner is still writing its contents. If
fuser says the file is good to go, the script creates a lock file, mkdir
should be good enough.
I do some processing with img2pdf and ocrmypdf to write an OCR layer to
the file. Finally, change the resulting file permissions so that Mayan can
read it and remove the lock file.
I would have to think Linux has a similar tool available to test if
another process has a handle to a file. But this will be OS specific
unfortunately. I don't think there is any way to get around that.
Post by Roberto Rosario
I would be hesitant to call this a bug since this a situation that can
happen with any two software sharing a file. Try it with the a graphic
programs like the Gimp while scanning and the same would happen.
The receiving software has no way of knowing that the file is not yet
ready for processing.
The only way I could think is to examine the file write handlers of a
file and make sure there is not an open one, but is a long shot from being
a perfect solution.
The use case of direct scanning to a watched folder is ideal so I'll
take a shot at trying to fix this situation.
Post by Bruno CAPELETO
Dear all,
I found a reproducible and blocking bug regarding the watch folder
functionality.
First, I sat up a watch folder that worked fine : when I moved a
document in it, it was handled as expected and entered Mayan EDMS with the
correct document type.
Then when I scanned a document directly into the watch folder, once it
got into Mayan as expected, the next time it entered it but it was
corrupted.
I found the reason for that : the scanning instrument (a professional
Konica Minolta C220) start to write into the watch folder as soon as the
scanning process starts, and of course it takes some time to finish
(approx. 5 s for 2 pages, each 2 sided).
When Maya looks at the folder when the scanning is finished, not
problem and the document is handled as it should. But when Maya looks at
the folder during the scanning process, it does not wait and takes the
beginning of the document only : this document ends in Maya as a corrupted
document.
This bug makes it simply impossible to scan document directly into Maya
EDMS, because one can never be sure the document arrived correctly : Maya
must wait until the document is full (size does not change anymore for
example).
Cheers,
Bruno
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
C V
2017-11-11 10:31:02 UTC
Permalink
Can you send your script to use for our Association; i'm testing the mayan
edms program but i'm not very practical.
Thanks in advance.

Fabio Gavazza

Il giorno mercoledì 8 novembre 2017 08:59:09 UTC+1, Matthias Löblich ha
Post by Matthias Löblich
I have a similar solution, but scanning into an tmp-folder (on an RASPI)
and then running a script to do preprocessing stuff (cropping, cleaning,
...). After preprocessing the script is moving the file to the watch
folder. This solution works without a problem. ... Look like moving the
file is "fast- enough" to do not have overlapping operations with the
mayan job.
br
Matthias
Post by Mike Rabas
I'm running on FreeBSD and have hacked up a solution using a shell script
and cronjob every minute.
The scanner will drop a file into the watch folder. The file is created
with permissions that are not readable by Mayan. The script will check for
files in the watch folder and if it finds them run `fuser`, a program that
lists all processes that have a given file open. If another process has
the file open, that means the scanner is still writing its contents. If
fuser says the file is good to go, the script creates a lock file, mkdir
should be good enough.
I do some processing with img2pdf and ocrmypdf to write an OCR layer to
the file. Finally, change the resulting file permissions so that Mayan can
read it and remove the lock file.
I would have to think Linux has a similar tool available to test if
another process has a handle to a file. But this will be OS specific
unfortunately. I don't think there is any way to get around that.
Post by Roberto Rosario
I would be hesitant to call this a bug since this a situation that can
happen with any two software sharing a file. Try it with the a graphic
programs like the Gimp while scanning and the same would happen.
The receiving software has no way of knowing that the file is not yet
ready for processing.
The only way I could think is to examine the file write handlers of a
file and make sure there is not an open one, but is a long shot from being
a perfect solution.
The use case of direct scanning to a watched folder is ideal so I'll
take a shot at trying to fix this situation.
Post by Bruno CAPELETO
Dear all,
I found a reproducible and blocking bug regarding the watch folder
functionality.
First, I sat up a watch folder that worked fine : when I moved a
document in it, it was handled as expected and entered Mayan EDMS with the
correct document type.
Then when I scanned a document directly into the watch folder, once it
got into Mayan as expected, the next time it entered it but it was
corrupted.
I found the reason for that : the scanning instrument (a professional
Konica Minolta C220) start to write into the watch folder as soon as the
scanning process starts, and of course it takes some time to finish
(approx. 5 s for 2 pages, each 2 sided).
When Maya looks at the folder when the scanning is finished, not
problem and the document is handled as it should. But when Maya looks at
the folder during the scanning process, it does not wait and takes the
beginning of the document only : this document ends in Maya as a corrupted
document.
This bug makes it simply impossible to scan document directly into Maya
EDMS, because one can never be sure the document arrived correctly : Maya
must wait until the document is full (size does not change anymore for
example).
Cheers,
Bruno
--
---
You received this message because you are subscribed to the Google Groups "Mayan EDMS" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mayan-edms+***@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Loading...