CoEPP RC
 

Rucio

Introduction

  • Rucio clients should be able to fully replace the functionality provided by DQ2 clients (however, the legacy DQ2 clients are still available).
  • Rucio provides two kinds of clients
  • The Rucio Ui Web interface provides similar functionalities to DaTRI (which will be decommissioned on the 27th of July 2015).
    • Data transfers subscriptions are available via a new functionality called Rucio rules
    • Rucio rules can also be set via Rucio CLI

Documentation

Concepts

Files, DataSets and Containers

  • ATLAS has a large amount of data, which is physically stored in files.
  • For the data management system these files are the smallest operational unit of data. Physicists need to be able to identify and operate on any arbitrary set of files.
  • Files can be grouped into datasets (a named set of files) and datasets can be grouped into containers (a named set of datasets or, recursively, containers)
  • All three types of names refer to data so the term ‘data identifier’ (DID) is used to mean any set of file, dataset or container identifiers. A data identifier is just the name of a single file, dataset or container.

Dataset identifiers (DIDs)

  • Represent the core objects in Rucio. They may be files, datasets or containers. Many Rucio commands ask for a DID and can accept files, datasets or containers.

Account

  • Your identity in Rucio. A default Rucio account is automatically created from your ATLAS VO nickname.

Scope

  • Scopes are a new concept in Rucio and are a way of partitioning the dataset and file namespace.
  • Every DID has a scope and a name and the Rucio clients always display this as scope:name.
  • When a command requires a DID argument it should be the full DID with scope:name, but if the first part of the name is the same as the scope, scope can be left out.
  • With the default Rucio account you may only create new DIDs in your own scope, user.username. The users can choose whatever name for their dataset as long as it stays in his own scope (e.g. user jdoe can create user.jdoe:mytest.root).
  • Only special admin users can create DIDs under other scopes different from the user scope. For example, for the official datasets or containers, the scope is the same as the first part of the of the dataset/container (e.g. data12_8TeV), so that the scope can be omitted when using the Rucio clients.

Rucio Storage Elements (RSE)

  • An abstraction for storage end-points, eg CERN-PROD_SCRATCHDISK is an RSE. DIDs are stored on RSEs.

Replications rules

  • The replication rules (aka rules) are a way to describe how a Data IDentifier must be replicated on a list of Rucio Storage Elements.
  • A rule is associated to an account, to a DID and to an RSE expression.
  • When a rule is set on a DID on a particular RSE, either the DID is already at the site and nothing will happen, or if the DID is not at the site, the rule will generate transfers to the site.
  • When a rule is set on a DID at one site, the DID cannot be deleted.
  • If the DID is not covered by any rule, it is eligible for deletion, i.e. it will be deleted if space is needed.

Permissions and user quotas

  • Regular users are only permitted to upload data directly to SCRATCHDISK. Data on SCRATCHDISK has a lifetime of 15 days.
  • Using Rucio rules, regular users should be able to transfer data to LOCALGROUPDISK.
  • Quotas determine how much data you can put on an RSE and are also a way of enforcing permissions. Quotas are allocated in the following manner:
    • SCRATCHDISK: every user has a quota of 50% of each SCRATCHDISK RSE. This is to avoid one user filling all the space on an RSE.
    • LOCALGROUPDISK: users who are registered in the VOMS group /atlas/xx have 95% quota on all LOCALGROUPDISK RSEs in country xx.

Rucio CLI

Requirements

  • We are assuming that you have a valid grid certificate correctly installed under $HOME/.globus, and that you are already registered on ATLAS VO.

Setup

  • Login into a CoEPP UI and execute setupATLAS; localSetupRucioClients; voms-proxy-init –voms atlas
$ setupATLAS
...Type localSetupAGIS to setup AGIS
...Type localSetupAtlantis to setup Atlantis
...Type localSetupDQ2Client to use DQ2 Client
...Type localSetupEIClient to setup EIClient
...Type localSetupFAX to use FAX
...Type localSetupGanga to use Ganga
...Type localSetupGcc to use alternate gcc
...Type localSetupPacman to use Pacman
...Type localSetupPandaClient to use Panda Client
...Type localSetupPyAMI to setup pyAMI
...Type localSetupPoD to setup Proof-on-Demand
...Type localSetupROOT to setup (standalone) ROOT
...Type localSetupDQ2Wrappers to setup DQ2Wrappers
...Type localSetupSFT to setup SFT packages
...Type localSetupXRootD to setup XRootD
...Type showVersions to show versions of installed software
...Type asetup to setup a release (changeASetup to change asetup version)
...Type rcSetup to setup an ASG release (changeRCSetup to change rcSetup ver.)
...Type diagnostics for diagnostic tools
...Type helpMe for more help
...Type printMenu to show this menu
(...)
$ localSetupRucioClients
************************************************************************
Setting up rucio-clients version 0.3.9
Info: Setting compatibility to slc6
Info: Set RUCIO_AUTH_TYPE to x509_proxy
Info: Set RUCIO_ACCOUNT to goncalo
************************************************************************
    
$ voms-proxy-init --voms atlas
Enter GRID pass phrase for this identity:
Contacting voms2.cern.ch:15001 [/DC=ch/DC=cern/OU=computers/CN=voms2.cern.ch] "atlas"...
Remote VOMS server contacted succesfully.
Created proxy in /tmp/x509up_u1051.
Your proxy is valid until Mon Dec 29 17:58:05 UTC 2014
  • Test if your Rucio client is able to connect to the Rucio server
$ rucio ping
0.3.10
  • Get information about the current account
$ rucio whoami
status     : ACTIVE
account    : goncalo
account_type : USER
created_at : 2014-11-19T11:49:19
suspended_at : None
updated_at : 2014-11-19T11:49:19
deleted_at : None
email      : goncalo@physics.usyd.edu.au
  • Check if you are registered in rucio in the correct country
$ rucio-admin account list-attributes --account goncalo
+------------+---------+
| Key        | Value   |
|------------+---------|
| cloud-ca   | admin   |
| country-au | admin   |
+------------+---------+

General operations

  • Get list of available sites / storages
$ rucio list-rses | grep -i australia
AUSTRALIA-ATLAS_DATADISK
AUSTRALIA-ATLAS_LOCALGROUPDISK
AUSTRALIA-ATLAS_PHYS-SM
AUSTRALIA-ATLAS_PRODDISK
AUSTRALIA-ATLAS_SCRATCHDISK
AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK
  • Check where you have quotas
$ rucio list-account-limits goncalo
+---------------------------------------+--------------+
| RSE                                   |   LIMIT (TB) |
|---------------------------------------+--------------|
| AGLT2_SCRATCHDISK                     |   20         |
| AGLT2_USERDISK                        |  215         |
| AM-04-YERPHI_SCRATCHDISK              |    1.09951   |
| ANLASC_SCRATCHDISK                    |    1.145     |
| ANLASC_USERDISK                       |    2.875     |
| AUSTRALIA-ATLAS_LOCALGROUPDISK        |   69.35      |
| AUSTRALIA-ATLAS_SCRATCHDISK           |   21.5       |
| AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK |  225         |
(...)
  • Check what quota you have used
$ rucio list-account-usage goncalo
+-------+--------------+--------------+-------------------+
| RSE   |   USAGE (TB) |   LIMIT (TB) |   QUOTA LEFT (TB) |
|-------+--------------+--------------+-------------------|
+-------+--------------+--------------+-------------------+

Search for data

Find a did

  • List all the datasets and containers for a scope (We assume researchers know the scope of the data they are interested in. Otherwise, they can also search for the available scopes registered in rucio with rucio list-scopes).
$ rucio list-scopes | grep mc12
mc12
mc12_13TeV
mc12_14TeV
mc12_2TeV
mc12_33TeV
mc12_5TeV
mc12_7TeV
mc12_8TeV
mc12_valid
$ rucio list-dids --short mc12_14TeV:*
mc12_14TeV:mc12_14TeV.182829.Pythia8_AU2MSTW2008LO_Wprime_WZ_llqq_m4000_min1500.merge.AOD.e2493_s1682_s1691_r5407_r4643
mc12_14TeV:mc12_14TeV.119996.Pythia8_A2MSTW2008LO_minbias_inelastic_high.simul.log.e1133_s1638
mc12_14TeV:mc12_14TeV.189962.MadGraphPythia_AUET2BCTEQ6L1_ttbarWjExcl_2lepfilt.evgen.log.e3060
mc12_14TeV:mc12_14TeV.161805.Pythia8_AU2CTEQ6L1_WH125_lnubb.recon.ESD.e1337_s1682_s1691_r4710
mc12_14TeV:mc12_14TeV.147916.Pythia8_AU2CT10_jetjet_JZ6W.merge.TAG.e2403_s1715_s1691_r4986_r4643_t90
mc12_14TeV:mc12_14TeV.167791.Sherpa_CT10_WmunuMassiveCBPt500_BFilter.merge.HITS.e2357_s1719_s1720
mc12_14TeV:mc12_14TeV.182821.Pythia8_AU2MSTW2008LO_Wprime_WZ_lvqq_m2000.merge.TAG.e2493_s1682_s1691_r5400_r4643_t90
mc12_14TeV:mc12_14TeV.147912.Pythia8_AU2CT10_jetjet_JZ2W.merge.log.e2403_s1715_s1691_r4990_r4643
mc12_14TeV:mc12_14TeV.147914.Pythia8_AU2CT10_jetjet_JZ4W.merge.log.e2403_s1715_s1691_r4991_r4643_t90
mc12_14TeV:mc12_14TeV.147912.Pythia8_AU2CT10_jetjet_JZ2W.merge.ESD.e1996_s1729_s1720_r5477_p1664
(...)
  • Search dids by pattern:
$ rucio list-dids --short mc12_14TeV:mc12_14TeV.167817.Sherpa_CT10_ZtautauMassiveCBPt140_280_CVetoBVeto.merge.log.e2445_p1614_tid01596380_00*
mc12_14TeV:mc12_14TeV.167817.Sherpa_CT10_ZtautauMassiveCBPt140_280_CVetoBVeto.merge.log.e2445_p1614_tid01596380_00
  • Search dids by meta-data:
$ rucio list-dids --short mc12_14TeV:* --filter datatype=AOD
mc12_14TeV:mc12_14TeV.147806.PowhegPythia8_AU2CT10_Zee.recon.AOD.e1564_s1762_s1777_r6029_tid04890038_00_sub0209082996
mc12_14TeV:mc12_14TeV.147807.PowhegPythia8_AU2CT10_Zmumu.recon.AOD.e1564_s2079_s1964_r6074_tid04787911_00_sub0206177092
mc12_14TeV:mc12_14TeV.190060.ParticleGenerator_mu_Pt500.recon.AOD.e3509_s2564_s1964_r6073_tid04889990_00_sub0209299430
mc12_14TeV:mc12_14TeV.190060.ParticleGenerator_mu_Pt500.recon.AOD.e3509_s2564_s1964_r6073_tid04889990_00_sub0209299452
mc12_14TeV:mc12_14TeV.190061.ParticleGenerator_el_Pt500.recon.AOD.e3509_s2564_s1964_r6076_tid04890001_00_sub0209304587
mc12_14TeV:mc12_14TeV.190060.ParticleGenerator_mu_Pt500.recon.AOD.e3509_s2564_s1964_r6073_tid04889990_00_sub0209327729
mc12_14TeV:mc12_14TeV.190060.ParticleGenerator_mu_Pt500.recon.AOD.e3509_s2564_s1964_r6076_tid04889986_00_sub0209336363
mc12_14TeV:mc12_14TeV.117050.PowhegPythia_P2011C_ttbar.merge.AOD.e2176_s1762_s1777_r6030_r4732_tid04659344_00_sub0205941591
mc12_14TeV:mc12_14TeV.190060.ParticleGenerator_mu_Pt500.recon.AOD.e3509_s2564_s1964_r6076_tid04889986_00_sub0209336352
mc12_14TeV:mc12_14TeV.147912.Pythia8_AU2CT10_jetjet_JZ2W.recon.AOD.e1996_s2564_s1964_r6301_tid06121135_00_sub0240001263
(...)
  • Search dids by type: You can filter the results for “file”, “dataset”, “container”, “collection” (dataset or container) or “all”.
$ rucio list-dids --short mc12_14TeV:*  --filter type=dataset
mc12_14TeV:mc12_14TeV.190061.ParticleGenerator_el_Pt500.recon.log.e3509_s2564_s1964_r6075_tid04890012_00_sub0214424251
mc12_14TeV:mc12_14TeV.160024.PowhegPythia8_AU2CT10_VBFH125_gamgam.merge.log.e1337_s1762_s1777_r6025_r4732_tid04659309_00
mc12_14TeV:mc12_14TeV.205070.PowhegPythia_P2011C_CT10_ttbar_hdamp172_nonallhad.evgen.EVNT.e3529_tid04848903_00_sub0208077277
mc12_14TeV:mc12_14TeV.160024.PowhegPythia8_AU2CT10_VBFH125_gamgam.recon.log.e1337_s2079_s1964_r6075_tid04787929_00_sub0208248159
mc12_14TeV:mc12_14TeV.147806.PowhegPythia8_AU2CT10_Zee.recon.ESD.e1564_s1762_s1777_r6029_tid04890038_00_sub0209082968
mc12_14TeV:mc12_14TeV.190060.ParticleGenerator_mu_Pt500.recon.log.e3509_s2564_s1964_r6075_tid04889997_00_sub0214425121
mc12_14TeV:mc12_14TeV.206286.MadGraphPythia8_AU2CTEQ6L1_gg_hh125_4b_lambda01.recon.log.e3202_s2564_s1964_r6075_tid04833640_00_sub0208968039
mc12_14TeV:mc12_14TeV.160024.PowhegPythia8_AU2CT10_VBFH125_gamgam.recon.log.e1337_s2079_s1964_r6075_tid04787929_00_sub0208351480
mc12_14TeV:mc12_14TeV.159000.ParticleGenerator_nu_E50.recon.log.e1564_s2079_s1964_r6077_tid04787894_00_sub0206148854
mc12_14TeV:mc12_14TeV.147806.PowhegPythia8_AU2CT10_Zee.recon.AOD.e1564_s1762_s1777_r6029_tid04890038_00_sub0209082996
(...)
  • List the properties of a did
$ rucio get-metadata mc12_14TeV:mc12_14TeV.147806.PowhegPythia8_AU2CT10_Zee.recon.AOD.e1564_s1762_s1777_r6029_tid04890038_00_sub0209082996
campaign: None
updated_at: 2015-03-26 18:51:48
is_new: None
is_open: False
guid: None
availability: AVAILABLE
deleted_at: None
panda_id: None
provenance: None
accessed_at: 2015-03-26 17:51:28
version: e1564_s1762_s1777_r6029
scope: mc12_14TeV
hidden: False
md5: None
events: 4700
adler32: None
complete: None
lumiblocknr: None
monotonic: False
obsolete: False
transient: None
did_type: DATASET
suppressed: False
expired_at: None
stream_name: PowhegPythia8_AU2CT10_Zee
account: panda
run_number: 147806
name: mc12_14TeV.147806.PowhegPythia8_AU2CT10_Zee.recon.AOD.e1564_s1762_s1777_r6029_tid04890038_00_sub0209082996
task_id: 4890038
datatype: AOD
created_at: 2015-02-24 14:57:48
bytes: 20548632959
project: mc12_14TeV
length: 94
prod_step: recon
phys_group: None

List files in a did

$ rucio list-files mc12_14TeV:mc12_14TeV.190061.ParticleGenerator_el_Pt500.recon.log.e3509_s2564_s1964_r6075_tid04890012_00_sub0214424251
+-----------------------------------------------+--------------------------------------+-------------+------------+----------+
| SCOPE:NAME                                    | GUID                                 | ADLER32     |   FILESIZE |   EVENTS |
|-----------------------------------------------+--------------------------------------+-------------+------------+----------|
| mc12_14TeV:log.04890012._009482.job.log.tgz.1 | 6E59D1D3-25F8-4EDF-9144-D948865591F9 | ad:bdf5dbeb |    2314932 |          |
+-----------------------------------------------+--------------------------------------+-------------+------------+----------+
Total files : 1
Total size : 2314932

List dids a given file belongs to

$ rucio list-parent-dids mc12_14TeV:log.04890012._009482.job.log.tgz.1
+------------------------------------------------------------------------------------------------------------------------+--------------+
| SCOPE:NAME                                                                                                             | [DID TYPE]   |
|------------------------------------------------------------------------------------------------------------------------+--------------|
| mc12_14TeV:mc12_14TeV.190061.ParticleGenerator_el_Pt500.recon.log.e3509_s2564_s1964_r6075_tid04890012_00               | DATASET      |
| mc12_14TeV:mc12_14TeV.190061.ParticleGenerator_el_Pt500.recon.log.e3509_s2564_s1964_r6075_tid04890012_00_sub0214424251 | DATASET      |
+------------------------------------------------------------------------------------------------------------------------+--------------+

List replicas of a file

$ rucio list-file-replicas mc12_14TeV:log.04890012._009482.job.log.tgz.1
  SCOPE                                         NAME                                      FILESIZE  ADLER32   RSE: REPLICA
----------------------------------------------------------------------------------------------------------------------------------
mc12_14TeV:log.04890012._009482.job.log.tgz.1                                        2314932  bdf5dbeb  SARA-MATRIX_DATADISK: srm://srm.grid.sara.nl:8443/srm/managerv2?SFN=/pnfs/grid.sara.nl/data/atlas/atlasdatadisk/rucio/mc12_14TeV/fa/09/log.04890012._009482.job.log.tgz.1
  • If you actually want to know the PFN (for a specific protocol) in a given RSE you can issue the following command. The accepted protocols are root, srm and http
$ rucio list-file-replicas --protocols root --rse SARA-MATRIX_DATADISK mc12_14TeV:log.04890012._009482.job.log.tgz.1
+------------+------------------------------------+------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
| SCOPE      | NAME                               | FILESIZE   | ADLER32   | RSE: REPLICA                                                                                                                                             |
|------------+------------------------------------+------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------|
| mc12_14TeV | log.04890012._009482.job.log.tgz.1 | 2.3 MB     | bdf5dbeb  | SARA-MATRIX_DATADISK: root://fax.grid.sara.nl:1094//pnfs/grid.sara.nl/data/atlas/atlasdatadisk/rucio/mc12_14TeV/fa/09/log.04890012._009482.job.log.tgz.1 |
+------------+------------------------------------+------------+-----------+----------------------------------------------------------------------------------------------------------------------------------------------------------+

List datasets in RSEs

$ rucio list-datasets-rse AUSTRALIA-ATLAS_LOCALGROUPDISK
SCOPE:NAME
----------
data12_8TeV:data12_8TeV.00206248.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224476_00
data12_8TeV:data12_8TeV.00206253.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224475_00
data12_8TeV:data12_8TeV.00206253.physics_Muons.merge.AOD.r4065_p1278_tid01057594_00
data12_8TeV:data12_8TeV.00206299.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224474_00
data12_8TeV:data12_8TeV.00206367.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224473_00
data12_8TeV:data12_8TeV.00206368.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224472_00
data12_8TeV:data12_8TeV.00206369.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224471_00
data12_8TeV:data12_8TeV.00206409.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224470_00
(...)

List replicas of a dataset

$ rucio list-dataset-replicas data12_8TeV:data12_8TeV.00206248.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224476_00
DATASET: data12_8TeV:data12_8TeV.00206248.physics_Egamma.merge.NTUP_TAU.r4065_p1278_p1443_tid01224476_00
+--------------------------------+---------+---------+
| RSE                            |   FOUND |   TOTAL |
|--------------------------------+---------+---------|
| AUSTRALIA-ATLAS_LOCALGROUPDISK |       1 |       1 |
| INFN-NAPOLI-ATLAS_PHYS-HIGGS   |       1 |       1 |
| MWT2_UC_PERF-TAU               |       1 |       1 |
+--------------------------------+---------+---------+

Download data

  • Download a did (if the did represents a dataset, all files from the dataset will be downloaded)
    1. To download to a specific directory, include the '–dir <path>' option
    2. To download from a specific RSE, include the '–rse <rse>' options
    3. To use a specific protocol, include the '–protocol <protocol>' option
    4. To increase or decrease the parallelism of the download process, use '–ndownloader <N>'
    5. To download random files from the did (if it represents a dataset), use '–nrandom <N>'
$ rucio -v download mc14_13TeV.169153.PowhegPythia8_AU2CT10_VBFH600NWA_WWlepnuqq.recon.log.e3292_s1982_s2008_r5787_tid04606738_00_sub0201586236
2015-08-04 04:55:42,165 DEBUG [{u'key': u'cloud-ca', u'value': u'admin'}]
2015-08-04 04:55:42,165 DEBUG [{u'key': u'country-au', u'value': u'admin'}]
2015-08-04 04:55:42,912 INFO [Starting download for mc14_13TeV:mc14_13TeV.169153.PowhegPythia8_AU2CT10_VBFH600NWA_WWlepnuqq.recon.log.e3292_s1982_s2008_r5787_tid04606738_00_sub0201586236 with 3 files]
2015-08-04 04:55:42,912 DEBUG [Getting the list of replicas]
2015-08-04 04:55:43,608 DEBUG [Will start downloading file mc14_13TeV:log.04606738._000047.job.log.tgz.1]
2015-08-04 04:55:43,679 DEBUG [Will start downloading file mc14_13TeV:log.04606738._000048.job.log.tgz.1]
2015-08-04 04:55:43,679 DEBUG [Will start downloading file mc14_13TeV:log.04606738._000049.job.log.tgz.1]
2015-08-04 04:55:43,684 INFO [Starting the download of mc14_13TeV:log.04606738._000047.job.log.tgz.1]
2015-08-04 04:55:43,685 DEBUG [Choosing RSE]
2015-08-04 04:55:45,080 DEBUG [Getting file mc14_13TeV:log.04606738._000047.job.log.tgz.1 from FZK-LCG2_DATADISK]
100% |#############################################################################################################|
File downloaded. Will be validated
File validated
2015-08-04 04:55:57,549 INFO [File mc14_13TeV:log.04606738._000047.job.log.tgz.1 successfully downloaded from FZK-LCG2_DATADISK]
2015-08-04 04:55:58,877 INFO [File mc14_13TeV:log.04606738._000047.job.log.tgz.1 successfully downloaded. 28112 bytes downloaded in 15.1923429966 seconds]
2015-08-04 04:55:58,877 INFO [Starting the download of mc14_13TeV:log.04606738._000048.job.log.tgz.1]
2015-08-04 04:55:58,877 DEBUG [Choosing RSE]
2015-08-04 04:55:58,877 DEBUG [Getting file mc14_13TeV:log.04606738._000048.job.log.tgz.1 from FZK-LCG2_DATADISK]
100% |#############################################################################################################|
File downloaded. Will be validated
File validated
2015-08-04 04:56:05,452 INFO [File mc14_13TeV:log.04606738._000048.job.log.tgz.1 successfully downloaded from FZK-LCG2_DATADISK]
2015-08-04 04:56:06,778 INFO [File mc14_13TeV:log.04606738._000048.job.log.tgz.1 successfully downloaded. 28528 bytes downloaded in 7.90140390396 seconds]
2015-08-04 04:56:06,779 INFO [Starting the download of mc14_13TeV:log.04606738._000049.job.log.tgz.1]
2015-08-04 04:56:06,779 DEBUG [Choosing RSE]
2015-08-04 04:56:06,779 DEBUG [Getting file mc14_13TeV:log.04606738._000049.job.log.tgz.1 from FZK-LCG2_DATADISK]
100% |#############################################################################################################|
File downloaded. Will be validated
File validated
2015-08-04 04:56:13,774 INFO [File mc14_13TeV:log.04606738._000049.job.log.tgz.1 successfully downloaded from FZK-LCG2_DATADISK]
2015-08-04 04:56:15,104 INFO [File mc14_13TeV:log.04606738._000049.job.log.tgz.1 successfully downloaded. 28236 bytes downloaded in 8.3258061409 seconds]
2015-08-04 04:56:15,723 INFO [Download operation for mc14_13TeV:mc14_13TeV.169153.PowhegPythia8_AU2CT10_VBFH600NWA_WWlepnuqq.recon.log.e3292_s1982_s2008_r5787_tid04606738_00_sub0201586236 done]
----------------------------------
Download summary
----------------------------------------
DID mc14_13TeV:mc14_13TeV.169153.PowhegPythia8_AU2CT10_VBFH600NWA_WWlepnuqq.recon.log.e3292_s1982_s2008_r5787_tid04606738_00_sub0201586236
Total files :                                 3
Downloaded files :                            3
Files already found locally :                 0
Files that cannot be downloaded :             0
Completed in 36.2751 sec.
$ ll mc14_13TeV.169153.PowhegPythia8_AU2CT10_VBFH600NWA_WWlepnuqq.recon.log.e3292_s1982_s2008_r5787_tid04606738_00_sub0201586236/
total 84
-rw-r--r-- 1 goncalo people 28112 Aug  4 04:55 log.04606738._000047.job.log.tgz.1
-rw-r--r-- 1 goncalo people 28528 Aug  4 04:56 log.04606738._000048.job.log.tgz.1
-rw-r--r-- 1 goncalo people 28236 Aug  4 04:56 log.04606738._000049.job.log.tgz.1
  • You can also build an input file with all your DIDs (one per line), and give it as input to rucio like in the following example
$ rucio download `cat input.txt`

Create data

  • The typical use case is that you produced locally some files, but want to share it with some other persons, or you want to run over these files using Distributed Analysis tools like Panda. For this you need to upload the files into a dataset on some Rucio Storage element (RSE).
  • If you create files into your own scope which is 'user.', there is no restriction for did names. However, once a name has been used for a Data IDentifier, it cannot be reused anymore even if you delete the original! For official data, a specific nomenclature is used.
  • You can decide to upload your datasets into 2 different storage areas in ATLAS Distributed Computing Infrastructure:
    1. SCRATCHDISK: The datasets uploaded there will be kept for 2 weeks, but after that period, they can disappear at anytime.
    2. LOCALGROUPDISK: These areas are dedicated to local users and are managed by the cloud squads. Permissions are set according to the user nationality and/or institute (see above). The retention policy and the quota on these endpoints are defined by the cloud squads.

Create a dataset from files on my local disk

$ rucio -v upload --rse AUSTRALIA-ATLAS_SCRATCHDISK user.goncalo:My1stDataset mylog.1 mylog.2 mylog.3
2015-08-04 07:12:42,710 DEBUG [Extracting filesize (28112) and checksum (f9c8015e) for file user.goncalo:mylog.1]
2015-08-04 07:12:42,712 DEBUG [Extracting filesize (28528) and checksum (455648a8) for file user.goncalo:mylog.2]
2015-08-04 07:12:42,713 DEBUG [Extracting filesize (28236) and checksum (2fc93fd2) for file user.goncalo:mylog.3]
2015-08-04 07:12:46,681 DEBUG [Using account goncalo]
2015-08-04 07:12:47,180 INFO [Dataset successfully created]
2015-08-04 07:12:48,446 INFO [Adding replicas in Rucio catalog]
2015-08-04 07:12:49,872 INFO [Replicas successfully added]
2015-08-04 07:13:05,295 INFO [File user.goncalo:mylog.1 successfully uploaded on the storage]
2015-08-04 07:13:07,796 INFO [Adding replicas in Rucio catalog]
2015-08-04 07:13:09,414 INFO [Replicas successfully added]
2015-08-04 07:13:23,063 INFO [File user.goncalo:mylog.2 successfully uploaded on the storage]
2015-08-04 07:13:25,558 INFO [Adding replicas in Rucio catalog]
2015-08-04 07:13:26,882 INFO [Replicas successfully added]
2015-08-04 07:13:41,580 INFO [File user.goncalo:mylog.3 successfully uploaded on the storage]
2015-08-04 07:13:49,023 INFO [Will update the file replicas states]
2015-08-04 07:13:49,561 INFO [File replicas states successfully updated]
Completed in 68.8498 sec.
$ rucio list-dids --recursive user.goncalo:My1stDataset
+---------------------------+--------------+
| SCOPE:NAME                | [DID TYPE]   |
|---------------------------+--------------|
| user.goncalo:mylog.1      | FILE         |
| user.goncalo:mylog.2      | FILE         |
| user.goncalo:mylog.3      | FILE         |
| user.goncalo:My1stDataset | DATASET      |
+---------------------------+--------------+
$ rucio list-dataset-replicas user.goncalo:My1stDataset
DATASET: user.goncalo:My1stDataset
+-----------------------------+---------+---------+
| RSE                         |   FOUND |   TOTAL |
|-----------------------------+---------+---------|
| AUSTRALIA-ATLAS_SCRATCHDISK |       3 |       3 |
+-----------------------------+---------+---------+

Create a dataset from files in other datasets

  • The operation consists in creating am empty dataset, and on attaching existing files to that dataset
$ rucio list-files user.goncalo:My1stDataset
+----------------------+--------------------------------------+-------------+------------+----------+
| SCOPE:NAME           | GUID                                 | ADLER32     |   FILESIZE |   EVENTS |
|----------------------+--------------------------------------+-------------+------------+----------|
| user.goncalo:mylog.1 | 50459DE9-8C46-4AF5-95AB-017404C9020E | ad:f9c8015e |      28112 |          |
| user.goncalo:mylog.2 | BA813A98-13B5-4D86-897E-910BA33EA848 | ad:455648a8 |      28528 |          |
| user.goncalo:mylog.3 | B597A27B-E46C-45A0-BFD2-13EC275A0E21 | ad:2fc93fd2 |      28236 |          |
+----------------------+--------------------------------------+-------------+------------+----------+
Total files : 3
Total size : 84876
$  rucio add-dataset user.goncalo:My2ndDataset
Added user.goncalo:My2ndDataset
$ rucio attach user.goncalo:My2ndDataset user.goncalo:mylog.1
DIDs successfully attached to user.goncalo:My2ndDataset
$  rucio list-files user.goncalo:My2ndDataset
+----------------------+--------------------------------------+-------------+------------+----------+
| SCOPE:NAME           | GUID                                 | ADLER32     |   FILESIZE |   EVENTS |
|----------------------+--------------------------------------+-------------+------------+----------|
| user.goncalo:mylog.1 | 50459DE9-8C46-4AF5-95AB-017404C9020E | ad:f9c8015e |      28112 |          |
+----------------------+--------------------------------------+-------------+------------+----------+
Total files : 1
Total size : 28112
  • Please note that, in this case, 'rucio list-dataset-replicas user.goncalo:My2ndDataset' will not display any information since the dataset and the files there in were not uploaded from scratch.

Add a local file to an existing dataset

$ rucio -v upload --rse AUSTRALIA-ATLAS_SCRATCHDISK user.goncalo:My2ndDataset mylog.4
2015-08-05 02:57:50,942 DEBUG [Extracting filesize (28236) and checksum (2fc93fd2) for file user.goncalo:mylog.4]
2015-08-05 02:57:55,735 DEBUG [Using account goncalo]
2015-08-05 02:57:57,817 WARNING [The dataset name already exist]
2015-08-05 02:57:59,073 INFO [Adding replicas in Rucio catalog]
2015-08-05 02:58:00,413 INFO [Replicas successfully added]
2015-08-05 02:58:16,777 INFO [File user.goncalo:mylog.4 successfully uploaded on the storage]
2015-08-05 02:58:21,269 INFO [Will update the file replicas states]
2015-08-05 02:58:21,676 INFO [File replicas states successfully updated]
Completed in 32.1619 sec.
$ rucio list-dids --recursive user.goncalo:My2ndDataset
+---------------------------+--------------+
| SCOPE:NAME                | [DID TYPE]   |
|---------------------------+--------------|
| user.goncalo:mylog.1      | FILE         |
| user.goncalo:mylog.4      | FILE         |
| user.goncalo:My2ndDataset | DATASET      |
+---------------------------+--------------+

Migrate files between datasets

  • Procedure consists in detaching a file from a dataset and attaching to another
$ rucio list-dids --recursive user.goncalo:My2ndDataset
+---------------------------+--------------+
| SCOPE:NAME                | [DID TYPE]   |
|---------------------------+--------------|
| user.goncalo:mylog.1      | FILE         |
| user.goncalo:mylog.4      | FILE         |
| user.goncalo:My2ndDataset | DATASET      |
+---------------------------+--------------+
$ rucio detach user.goncalo:My2ndDataset user.goncalo:mylog.4
$  rucio list-files user.goncalo:My2ndDataset
+----------------------+--------------------------------------+-------------+------------+----------+
| SCOPE:NAME           | GUID                                 | ADLER32     |   FILESIZE |   EVENTS |
|----------------------+--------------------------------------+-------------+------------+----------|
| user.goncalo:mylog.1 | 50459DE9-8C46-4AF5-95AB-017404C9020E | ad:f9c8015e |      28112 |          |
+----------------------+--------------------------------------+-------------+------------+----------+
Total files : 1
Total size : 28112
$ rucio attach user.goncalo:My1stDataset user.goncalo:mylog.4
DIDs successfully attached to user.goncalo:My1stDataset
$  rucio list-files user.goncalo:My1stDataset
+----------------------+--------------------------------------+-------------+------------+----------+
| SCOPE:NAME           | GUID                                 | ADLER32     |   FILESIZE |   EVENTS |
|----------------------+--------------------------------------+-------------+------------+----------|
| user.goncalo:mylog.1 | 50459DE9-8C46-4AF5-95AB-017404C9020E | ad:f9c8015e |      28112 |          |
| user.goncalo:mylog.2 | BA813A98-13B5-4D86-897E-910BA33EA848 | ad:455648a8 |      28528 |          |
| user.goncalo:mylog.3 | B597A27B-E46C-45A0-BFD2-13EC275A0E21 | ad:2fc93fd2 |      28236 |          |
| user.goncalo:mylog.4 | B8938642-9020-46A0-A92A-21E2D71869BD | ad:2fc93fd2 |      28236 |          |
+----------------------+--------------------------------------+-------------+------------+----------+
Total files : 4
Total size : 113112

Final Notes on datasets

  • Once you finished a dataset, you should “close” it. Rucio operations are optimized for 'closed' datasets. However, keep in mind that:
    1. if you plan to add more files later, you should keep it opened. Reopening datasets is a privileged operations that can only be performed by certain users.
    2. If you want to keep adding and removing files in a dataset, think on using containers.
$ rucio close user.goncalo:My1stDataset
  • By default, user datasets are created on SCRATCHDISK at the site where the jobs run. All the datasets on SCRATCHDISK are to be deleted after a certain period (minimum 7 days). To retrieve your output files, you should either:
    1. Request a transfer to LOCALGROUPDISK. The output files will stay as a dataset on Grid.
    2. Download onto your local disk using rucio download. The output files will not be available via DDM after the dataset on the SCRATCHDISK is deleted. If the files are Athena files (POOL files), you will not be able to re-register the files. If you see a possibility to use them on Grid, you should think about setting rules.

Transfer data

  • The new way to trigger the automatic transfer of files and datasets is by setting replication rules on dids.
  • In the following examples we are focused on the rucio CLI but the same operations can be done using the Rucio UI web interface: https://rucio-ui.cern.ch/
    • First time users should take a tour using the link provided at the top of the page. This tour will explain how to use the Rucio UI to create a transfer request.
  • Please note that you can transfer data to 2 different storage areas in ATLAS Distributed Computing Infrastructure:
    1. SCRATCHDISK: The datasets uploaded there will be kept for 2 weeks, but after that period, they can disappear at anytime.
    2. LOCALGROUPDISK: These areas are dedicated to local users and are managed by the cloud squads. Permissions are set according to the user nationality and/or institute (see above). The retention policy and the quota on these endpoints are defined by the cloud squads.

Transfer data to a specific location

  • In the example bellow, we are setting a specific rule to trigger the transfer/copy of the user.goncalo:My1stDataset (including all its files) to the RSE AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK.
$ rucio add-rule user.goncalo:My1stDataset 1 'AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK'
3f653b3d6a164c8cb8d0e5e9a2288127
  • Please note that the expression between '' is a Boolean, and other types of expressions can be set such as ’tier=3’, ‘cloud=CA’, ‘country=au’, etc. To see what properties can you use to filter an endpoint, you can run:
$ rucio-admin rse get-attribute 'AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK'
fts: https://fts3-pilot.cern.ch:8446,https://fts3.cern.ch:8446,https://lcgfts3.gridpp.rl.ac.uk:8446
ftstesting: https://fts3-pilot.cern.ch:8446
ALL: True
physgroup: None
CANADASITES: True
spacetoken: T2ATLASLOCALGROUPDISK
AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK: True
site: Australia-ATLAS
CATIER2S: True
fts_testing: https://fts3-pilot.cern.ch:8446
cloud: CA
country: au
tier: 2
type: LOCALGROUPDISK

Status of data transfers

  • You can check which rules were set by yourself, and then, see details of a specific rule
$ rucio list-rules --account goncalo
ID                                ACCOUNT    SCOPE:NAME                   STATE[OK/REPL/STUCK    RSE_EXPRESSION                           COPIES    EXPIRES (UTC)
--------------------------------  ---------  ---------------------------  ---------------------  -------------------------------------  --------  ---------------
3f653b3d6a164c8cb8d0e5e9a2288127  goncalo    user.goncalo:My1stDataset    REPLICATING[0/4/0]     AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK         1
9a4bd38979da4491b68e5d1e46511a60  goncalo    user.goncalo:My1stDataset    OK[4/0/0]              AUSTRALIA-ATLAS_SCRATCHDISK                   1
  • When consulting the details of a specific rule, please note the 'State:' and the 'Locks' fields which will provide some information about the status of the transfer.
$ rucio rule-info 3f653b3d6a164c8cb8d0e5e9a2288127
Id:                         3f653b3d6a164c8cb8d0e5e9a2288127
Account:                    goncalo
Scope:                      user.goncalo
Name:                       My1stDataset
RSE Expression:             AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK
Copies:                     1
State:                      REPLICATING
Locks OK/REPLICATING/STUCK: 0/4/0    <---
Grouping:                   DATASET  <---
Expires at:                 None
Locked:                     False
Weight:                     None
Created at:                 2015-08-05 05:03:06
Updated at:                 2015-08-05 05:03:06
Error:                      None
Subscription Id:            None
Source replica expression:  None
Activity:                   default
Comment:                    None
  • After a reasonable time (a day or so), you should recheck the status of your transfer. If everything goes OK, your data should have arrived safely to its destiny
$ rucio rule-info 3f653b3d6a164c8cb8d0e5e9a2288127
Id:                         3f653b3d6a164c8cb8d0e5e9a2288127
Account:                    goncalo
Scope:                      user.goncalo
Name:                       My1stDataset
RSE Expression:             AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK
Copies:                     1
State:                      OK      <---
Locks OK/REPLICATING/STUCK: 4/0/0   <---
Grouping:                   DATASET
Expires at:                 None
Locked:                     False
Weight:                     None
Created at:                 2015-08-05 05:03:06
Updated at:                 2015-08-05 05:25:49
Error:                      None
Subscription Id:            None
Source replica expression:  None
Activity:                   default
Comment:                    None
$ rucio list-dataset-replicas user.goncalo:My1stDataset
DATASET: user.goncalo:My1stDataset
+---------------------------------------+---------+---------+
| RSE                                   |   FOUND |   TOTAL |
|---------------------------------------+---------+---------|
| AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK |       4 |       4 |
| AUSTRALIA-ATLAS_SCRATCHDISK           |       4 |       4 |
+---------------------------------------+---------+---------+

Deleting data

When rules apply to data

  • If the data you want to delete have rucio rules associated, you will have to delete the data by deleting the associated rule. Using this mechanis, you should be able to:
    1. delete data stored in the SCRATCHDISK area (before the 2 weeks limit)
    2. delete data stored in the LOCALGROUPDISK area in your national Tier2 storage system (before any limit imposed by your national squad)
  • In the following, please note that once we delete a rule, an 'Expiration Date' is set to the rule, and consequently, the data will eventually be deleted (automatically) by rucio. However, please note that, until the data is effectively deleted, it will continue to appear in rucio commands such as 'rucio list-dids'.
$ rucio list-rules --account goncalo
ID                                ACCOUNT    SCOPE:NAME                 STATE[OK/REPL/STUCK    RSE_EXPRESSION                           COPIES    EXPIRES (UTC)
--------------------------------  ---------  -------------------------  ---------------------  -------------------------------------  --------  ---------------
3f653b3d6a164c8cb8d0e5e9a2288127  goncalo    user.goncalo:My1stDataset  OK[4/0/0]              AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK         1
9a4bd38979da4491b68e5d1e46511a60  goncalo    user.goncalo:My1stDataset  OK[4/0/0]              AUSTRALIA-ATLAS_SCRATCHDISK                   1
$ rucio delete-rule 9a4bd38979da4491b68e5d1e46511a60
$ rucio list-rules --account goncalo
ID                                ACCOUNT    SCOPE:NAME                 STATE[OK/REPL/STUCK    RSE_EXPRESSION                           COPIES  EXPIRES (UTC)
--------------------------------  ---------  -------------------------  ---------------------  -------------------------------------  --------  -------------------
3f653b3d6a164c8cb8d0e5e9a2288127  goncalo    user.goncalo:My1stDataset  OK[4/0/0]              AUSTRALIA-ATLAS_T2ATLASLOCALGROUPDISK         1
9a4bd38979da4491b68e5d1e46511a60  goncalo    user.goncalo:My1stDataset  OK[4/0/0]              AUSTRALIA-ATLAS_SCRATCHDISK                   1  2015-08-06 02:41:27
$ rucio rule-info 9a4bd38979da4491b68e5d1e46511a60
Id:                         9a4bd38979da4491b68e5d1e46511a60
Account:                    goncalo
Scope:                      user.goncalo
Name:                       My1stDataset
RSE Expression:             AUSTRALIA-ATLAS_SCRATCHDISK
Copies:                     1
State:                      OK
Locks OK/REPLICATING/STUCK: 4/0/0
Grouping:                   DATASET
Expires at:                 2015-08-06 02:41:27 <---
Locked:                     False
Weight:                     None
Created at:                 2015-08-04 07:12:46
Updated at:                 2015-08-06 01:41:27
Error:                      None
Subscription Id:            None
Source replica expression:  None
Activity:                   default
Comment:                    None
  • Finally, please note that if more than one rule is set on data, a 'rucio delete-rule' will only delete the rule but not the data.

When rules do not apply to data

  • When rules do not apply to data, you can delete dids using 'rucio erase'. However, this option is only available starting from rucio 1.0.0 (which can be loaded already but it is not the default rucio version).
$ rucio list-dids user.goncalo:*
+-----------------------------+--------------+
| SCOPE:NAME                  | [DID TYPE]   |
|-----------------------------+--------------|
| user.goncalo:myfirstdataset | COLLECTION   |
| user.goncalo:mydataset      | COLLECTION   |
| user.goncalo:My1stDataset   | COLLECTION   |
| user.goncalo:My2ndDataset   | COLLECTION   |
+-----------------------------+--------------+
$ rucio erase user.goncalo:mydataset
$ rucio erase user.goncalo:myfirstdataset

(... wait some time (1 hour, more (?))... )
$ rucio list-dids user.goncalo:*
+---------------------------+--------------+
| SCOPE:NAME                | [DID TYPE]   |
|---------------------------+--------------|
| user.goncalo:My1stDataset | COLLECTION   |
| user.goncalo:My2ndDataset | COLLECTION   |
+---------------------------+--------------+

Other operations

  • Check rucio manual
$ rucio --help
usage: rucio [-h] [--version] [--verbose] [-H ADDRESS] [--auth-host ADDRESS]
             [-a ACCOUNT] [-S AUTH_STRATEGY] [-T TIMEOUT] [-u USERNAME]
             [-pwd PASSWORD] [--certificate CERTIFICATE]
             [--ca-certificate CA_CERTIFICATE]

             {delete-metadata,list-scopes,get-metadata,list-files,list-file-replicas,set-metadata,add-dataset,list-pfns,download,close,list-account-limits,list-datasets-rse,list-rses,ping,attach,list-parent-dids,rule-info,list-rse-attributes,stat,test-server,add-container,delete-rule,list-account-usage,detach,list-dids,list-content,list-dataset-replicas,add-rule,upload,list-rse-usage,update-rule,list-rules,whoami,delete}
             ...

positional arguments:
  {delete-metadata,list-scopes,get-metadata,list-files,list-file-replicas,set-metadata,add-dataset,list-pfns,download,close,list-account-limits,list-datasets-rse,list-rses,ping,attach,list-parent-dids,rule-info,list-rse-attributes,stat,test-server,add-container,delete-rule,list-account-usage,detach,list-dids,list-content,list-dataset-replicas,add-rule,upload,list-rse-usage,update-rule,list-rules,whoami,delete}
    ping                Ping Rucio server
    whoami              Get information about account whose token is used
    list-file-replicas  List file replicas
    list-dataset-replicas
                        List the dataset replicas
    add-dataset         Add dataset
    add-container       Add container
    attach              Attach a list of Data Identifiers (file, dataset or
                        container) to an other Data Identifier (dataset or
                        container)
    detach              Detach a list of Data Identifiers (file, dataset or
                        container) from an other Data Identifier (dataset or
                        container)
    list-dids           List the data identifiers matching some metadata
    list-parent-dids    List parent data identifiers
    list-scopes         List all available scopes
    close               Close data identifier
    stat                List attributes and statuses about data identifiers
    delete              Delete data identifier
    list-files          List data identifier contents
    list-content        List the content of a collection
    upload              Upload method
    download            Download method
    get-metadata        Get metadata for DIDs
    list-pfns           Get the expected PFNs for a list of files
    set-metadata        set-metadata method
    delete-metadata     Delete metadata
    list-rse-usage      list-rse-usage method
    list-account-usage  list-account-usage method
    list-account-limits
                        List account limits on RSEs
    add-rule            Add replication rule
    delete-rule         Delete replication rule
    rule-info           Retrieve information about a rule
    list-rules          List replication rules
    update-rule         Update replication rule
    list-rses           List RSEs
    list-rse-attributes
                        List the attributes of an RSE
    list-datasets-rse   List all the datasets at a Rucio Storage Element
    test-server         Test Server

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --verbose, -v         Print more verbose output
  -H ADDRESS, --host ADDRESS
                        The Rucio API host
  --auth-host ADDRESS   The Rucio Authentication host
  -a ACCOUNT, --account ACCOUNT
                        Rucio account to use
  -S AUTH_STRATEGY, --auth-strategy AUTH_STRATEGY
                        Authentication strategy (userpass or x509 or ...)
  -T TIMEOUT, --timeout TIMEOUT
                        Set all timeout values to SECONDS
  -u USERNAME, --user USERNAME
                        username
  -pwd PASSWORD, --password PASSWORD
                        password
  --certificate CERTIFICATE
                        Client certificate file
  --ca-certificate CA_CERTIFICATE
                        CA certificate to verify peer against (SSL)
grid/rucio.txt · Last modified: 2016/01/21 16:35 by goncalo
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 4.0 International
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki