In some situations, using an online calculator such as the Green Algorithms one isn’t very practical, e.g. when many different jobs are run. In an ideal world, there would be a tool that collects automatically all the details of the algorithms ran and estimate the corresponding energy usage and carbon footprint. GA4HPC is a first step in this direction.
High Performance Computing (HPC) clusters tend to log information on all jobs ran on them, for accounting purposes, and this information can be pulled.
Who is it for?
At this stage, the script works on any HPC server using SLURM as a workload manager. It can be adapted to other workload managers, see here on how to add one.
How to install it
It doesn’t require any particular permissions, you just need to copy the GitHub repository on your HPC drive, enter some information about your data centre, and you’re good to go! Tutorial here
How to use it
Anyone with access to the_shared_directory
where the script is located can run the calculator,
by running the same command, with various options available:
usage: myCarbonFootprint.sh [-h] [-S STARTDAY] [-E ENDDAY] [--filterCWD]
[--filterJobIDs FILTERJOBIDS]
[--filterAccount FILTERACCOUNT] [--reportBug]
[--reportBugHere]
[--useCustomLogs USECUSTOMLOGS]
Calculate your carbon footprint on CSD3.
optional arguments:
-h, --help show this help message and exit
-S STARTDAY, --startDay STARTDAY
The first day to take into account, as YYYY-MM-DD
(default: 2022-01-01)
-E ENDDAY, --endDay ENDDAY
The last day to take into account, as YYYY-MM-DD
(default: today)
--filterCWD Only report on jobs launched from the current
location.
--filterJobIDs FILTERJOBIDS
Comma seperated list of Job IDs you want to filter on.
--filterAccount FILTERACCOUNT
Only consider jobs charged under this account
--customSuccessStates CUSTOMSUCCESSSTATES
Comma-separated list of job states. By default, only
jobs that exit with status CD or COMPLETED are
considered succesful (PENDING, RUNNING and REQUEUD are
ignored). Jobs with states listed here will be
considered successful as well (best to list both
2-letter and full-length codes. Full list of job
states:
https://slurm.schedmd.com/squeue.html#SECTION_JOB-
STATE-CODES
--reportBug In case of a bug, this flag logs jobs informations so
that we can fix it. Note that this will write out some
basic information about your jobs, such as runtime,
number of cores and memory usage.
--reportBugHere Similar to --reportBug, but exports the output to your
home folder
--useCustomLogs USECUSTOMLOGS
This bypasses the workload manager, and enables you to
input a custom log file of your jobs. This is mostly
meant for debugging, but can be useful in some
situations. An example of the expected file can be
found at `example_files/example_sacctOutput_raw.tsv`.
Limitations to keep in mind
- The workload manager doesn’t alway log the exact CPU usage time, and when this information is missing, we assume that all cores are used at 100%.
- For now, we assume that GPU jobs only use 1 GPU and the GPU is used at 100%, as the information needed for more accurate measurement is not always available.
(both of these may lead to slightly overestimated carbon footprints, although the order of magnitude is likely to be correct)
- Conversely, the wasted energy due to memory overallocation may be largely underestimated, as the information needed is not always logged.
Report bugs
If you spot any bugs, or would like new features, just open a new issue on GitHub.
How to modify the script for my cluster?
See the “Edit code and contribute” page on how to modify the code and share your improvements with other users.
Licence
This work is licensed under a Creative Commons Attribution 4.0 International License.