Data archiving policy

While the back-up policy keeps 3 copies of the data (1 on disk and 2 on tapes), the archiving policy stores only the 2 copies on tape.

When a file is archived, its disk representation is truncated to 4 KB (this is referred to as a stub). Hence, accessing a file that has been replaced by its stub on disk is a slow process, since it first needs to migrate the data from tape to disk.

Archiving data

Hint

Before submitting your files for the archiving procedure, make sure they are properly organised. If you need help/guidance for this step, do not hesitate to contact the Bioinformatics team via SAM.

To archive data, you have to:

  1. Only the first time, ask the UDI GIGA-MED (via SAM) to create an ARCHIVES directory under your team directory on the mass storage.

  2. Create a directory inside the ARCHIVES one. This folder should have a meaningful name (do not use any space or special character).

  3. Determine the files/folders you want to submit for archiving and organise them inside the previously created folder.
    As a rule of thumb:

    • related files that are likely to be retrieved together should be in the same (sub)folder,

    • big files should be compressed as much as possible,

    • groups of small files should be grouped together into a compressed archive file.

  4. Wait until you have at least 500 GB of data to archive (eventually grouping projects together).

  5. Keep a record of what you have inside folder to be archived in a file (e.g., using the tree command).

  6. Contact the UDI GIGA-MED by filling a form via SAM. The form must contain:

    • the absolute path to the folder to archive,

    • the number of years that the data must be kept (default is 5 years),

    • the name of the PI in charge.

Note

After this procedure is complete, the only way you will be able to access the archived data is throught the data retrieval procedure.

Important considerations

For this procedure to be optimal, you should:

  • make sure that the data you submit to the archiving procedure is not duplicated elswhere on the mass storage.

  • minimise the number of single files under the archived subdirectory in order to limit the space used by the archived data.

Retrieving archived data

Attention

This procedure may take up to several days. Therefore, you should anticipate the need of the data to be retrieved.

To retrieve files/folders from an archived directory, you have to contact the UDI GIGA-MED via SAM. The form should contain the following pieces of information:

  • the path to the files/folder you want to retrieve (if you do not want to retrieve the whole directory, you must mention the specific paths).

  • the name of the PI in charge.

Hint

Before starting the retrieval procedure, you should make sure that you have enough free space on disk.

Automatic archiving policy

An automatic archiving procedure is used to reduce disk usage when the disk storage gets close to its maximum capacity.

This automatic archiving policy is defined as follows:

  1. It runs when the disk storage reaches 80% of its maximum capacity.

  2. It only affects files with size greater than 4 MB.

  3. It prioritises files that have not been modified for at least 270 days.