8. Folderwatcher

8.1. Introduction

The folderwatcher is a standalone daemon that scans certain directories for the presence of structured (i.e. csv and xml) and unstructured (e.g. png) files. If it detects these files and their sizes haven’t changed in a certain amount of time (to prevent processing files that are still being uploaded), it notifies DSB.

After processing the files, DSB replies to the folderwatcher. Reporting on either success or failure depends on the type of file:

  • Structured: A file is placed in the same directory as the processed files by the folderwatcher. The report filename is structured like this: <filename>_<status>_<date>_<time>.txt.

    • If processing succeeded, the report file may be named dataset1_success_20181002_080438.txt.

    • If processing failed, it may be named dataset1_failure_20181002_080438.txt.

  • Unstructured: A file called datafiles_transfer.log is placed in the project directory. Each processed file results in a line in this logfile, with the structure <date> <time> - <status> - <project> - <filename> - <info>.

    • If processing succeeded, the logline may be 20181002 08:13:51 - success - project1 - datafiles/image.png - {"status": "ok"}.

    • If processing failed, it may be 20181002 08:13:51 - failure - project1 - datafiles/img.png - {"status": "...", "error": "..."}.

8.2. Configuration

The configuration of folderwatcher needs to be provided in a json file. A working example is:

{
  "watch": {
    "users_dirs": ["/tmp/folderwatcher/users"],
    "projects_dirs": ["/tmp/folderwatcher/projects1", "/tmp/folderwatcher/projects2"],
    "project_data_dir": "data",
    "project_datafiles_dir": "datafiles",
    "csv_extension": ".csv",
    "def_extension": ".xml",
    "workers": 4,
    "time_between_checks_sec": 5
  },
  "dsb": {
    "url": "http://127.0.0.1:6543",
    "jwt_token": "eyJhbGciOiJIUzI1NiIs"
  },
  "report": {
    "success_template": "/opt/mgrid/dsb/backend/daemons/folderwatcher/structured_success.template",
    "failure_template": "/opt/mgrid/dsb/backend/daemons/folderwatcher/structured_failure.template"
  }
}

The settings are:

watch.users_dirs

Zero, one or more users directories where the folderwatcher should look for files to process. If the value is /users, it expects structured files in e.g. the /users/user1/project1/data directory.

watch.projects_dirs

Zero, one or more projects directories where the folderwatcher should look for files to process. If the value is /projects, it expects structured files in e.g. the /projects/project1/data directory.

watch.project_data_dir

The name of the subdirectory in a project dir to look for structured files.

watch.project_datafiles_dir

The name of the subdirectory in a project dir to look for unstructured files.

watch.csv_extension

The extension of csv files.

watch.def_extension

The extension of xml files.

watch.workers

The number of workers. This limits the maximum number of files that can be processed concurrently.

time_between_checks_sec

The time between checks of files present. If the size of a file hasn’t changed between checks, it is considered stable, i.e. not being uploaded by the user at the moment.

dsb.url

The url of the DSB instance to notify.

dsb.jwt_token

The JWT token to use in authentication with the DSB instance. This token must contain iss, iat, jti and sub items and needs to be signed using the HS512 algorithm with the secret that is known to DSB as management.secret.

report.success_template

Location of a template file that is used to signal success to the user.

report.failure_template

Location of a template file that is used to signal failure to the user.

8.3. Template files

A template file can use variables from folderwatcher to give meaningful feedback to the user. An example success template:

Import of dataset '{dataset}' for project '{project}' has succeeded with result '{result}'.
You can now analyze your data in the workspace.

An example failure template:

Import of dataset '{dataset}' for project '{project}' has failed with result '{result}'.
Please check the error carefully and try again.