8. Folderwatcher¶
8.1. Introduction¶
The folderwatcher is a standalone daemon that scans certain directories for the presence of structured (i.e. csv and xml) and unstructured (e.g. png) files. If it detects these files and their sizes haven’t changed in a certain amount of time (to prevent processing files that are still being uploaded), it notifies DSB.
After processing the files, DSB replies to the folderwatcher. Reporting on either success or failure depends on the type of file:
Structured: A file is placed in the same directory as the processed files by the folderwatcher. The report filename is structured like this:
<filename>_<status>_<date>_<time>.txt
.If processing succeeded, the report file may be named
dataset1_success_20181002_080438.txt
.If processing failed, it may be named
dataset1_failure_20181002_080438.txt
.
Unstructured: A file called
datafiles_transfer.log
is placed in the project directory. Each processed file results in a line in this logfile, with the structure<date> <time> - <status> - <project> - <filename> - <info>
.If processing succeeded, the logline may be
20181002 08:13:51 - success - project1 - datafiles/image.png - {"status": "ok"}
.If processing failed, it may be
20181002 08:13:51 - failure - project1 - datafiles/img.png - {"status": "...", "error": "..."}
.
8.2. Configuration¶
The configuration of folderwatcher needs to be provided in a json file. A working example is:
{
"watch": {
"users_dirs": ["/tmp/folderwatcher/users"],
"projects_dirs": ["/tmp/folderwatcher/projects1", "/tmp/folderwatcher/projects2"],
"project_data_dir": "data",
"project_datafiles_dir": "datafiles",
"csv_extension": ".csv",
"def_extension": ".xml",
"workers": 4,
"time_between_checks_sec": 5
},
"dsb": {
"url": "http://127.0.0.1:6543",
"jwt_token": "eyJhbGciOiJIUzI1NiIs"
},
"report": {
"success_template": "/opt/mgrid/dsb/backend/daemons/folderwatcher/structured_success.template",
"failure_template": "/opt/mgrid/dsb/backend/daemons/folderwatcher/structured_failure.template"
}
}
The settings are:
-
watch.users_dirs
¶ Zero, one or more users directories where the folderwatcher should look for files to process. If the value is
/users
, it expects structured files in e.g. the/users/user1/project1/data
directory.
-
watch.projects_dirs
¶ Zero, one or more projects directories where the folderwatcher should look for files to process. If the value is
/projects
, it expects structured files in e.g. the/projects/project1/data
directory.
-
watch.project_data_dir
¶ The name of the subdirectory in a project dir to look for structured files.
-
watch.project_datafiles_dir
¶ The name of the subdirectory in a project dir to look for unstructured files.
-
watch.csv_extension
¶ The extension of csv files.
-
watch.def_extension
¶ The extension of xml files.
-
watch.workers
¶ The number of workers. This limits the maximum number of files that can be processed concurrently.
-
time_between_checks_sec
¶ The time between checks of files present. If the size of a file hasn’t changed between checks, it is considered stable, i.e. not being uploaded by the user at the moment.
-
dsb.url
¶ The url of the DSB instance to notify.
-
dsb.jwt_token
¶ The JWT token to use in authentication with the DSB instance. This token must contain
iss
,iat
,jti
andsub
items and needs to be signed using theHS512
algorithm with the secret that is known to DSB asmanagement.secret
.
-
report.success_template
¶ Location of a template file that is used to signal success to the user.
-
report.failure_template
¶ Location of a template file that is used to signal failure to the user.
8.3. Template files¶
A template file can use variables from folderwatcher to give meaningful feedback to the user. An example success template:
Import of dataset '{dataset}' for project '{project}' has succeeded with result '{result}'.
You can now analyze your data in the workspace.
An example failure template:
Import of dataset '{dataset}' for project '{project}' has failed with result '{result}'.
Please check the error carefully and try again.