lancium_lci

Lancium GCLI Documentation

VERSION: v2.0

The GCLI is a python library to interact with the Lancium compute backend to create and control virtual machine and container instances.

Table of Contents

Setup

Account Creation

A Lancium account on the Lancium portal is required to use the GCLI. The same credentials that you set on the portal are used to authenticate with the GCLI.

Installation

Dependencies

Download: this script. This script grabs and installs the latest version of the GCLI and the dependencies.

After the download completes, run:

bash gcli_express_installer.sh

There are action items that will be presented to you regarding adjusting your search path to include the directories created with the installer.

You can verify that you have installed everything correctly by runningt the following commands.

gcli version && grid version

Which will return the following:

GCLI Version: 2.0
Genesis II version 2.7.647 Build 9922

Uninstallation

To uninstall the GCLI, navigate to the install directory and run:

./uninstall.sh

This will delete the source directory.

To uninstall GenesisII as well, navigate to $HOME/GenesisII/ and run the following:

./uninstall.sh -c

Note: This will still leave some user data around. To fully destory all GenesisII files run the following:

rm -r GenesisII .GenesisII

Depending on which grid commands you used, you may also need to run:

rm -r .genii_ui_persistence
rm -r .genesisII-2.0

Clientserver Startup (Recurring Action)

Before using the GCLI, the grid clientserver must be running on port 8888. To start the clientserver, run the following command:

gcli clientserver start

When the GCLI sends a request to the clientserver, it checks authentication and communicates over TLS with Lancium services to carry out the request (subject to access control.) As long as the clientserver is up, the GCLI will be able to communicate with the Lancium Compute Infrastructure. If it goes down, no communication is possible and the clientserver must be restarted and you must re-authenticate.

clientserver takes a port to use as an argument and by default the GCLI uses port 8888.

Authentication (Recurring Action)

Before using the GCLI, you must authenticate.

If you haven’t already, you must create a Lancium account on the Lancium portal before you can proceed.

After you have created your account, you can authenticate with the gcli with the same credentials as the portal by running:

gcli authenticate <PORTAL USERNAME/EMAIL> <PORTAL PASSWORD>

You can verify that you have logged in by running:

gcli whoami

If the clientserver closes you must reauthenticate.

GCLI Package

The GCLI package comes with a few important files. Besides changelog.txt, you must keep all of these files and maintain them in the same directory.

Job Resources

CPUs

CPUs refers to the number of virtual cores (or vCPUs) not physical cores. You can use the --cpus flag to change the number of virtual cores available for your job.

Memory

Memory refers to the amount of system memory that will be made available to you inside your container or VM. The value can be passed in as either megabytes or gigbytes by using the GB or MB suffixes (example --memory 4GB or --memory 4096MB). You can use the --memory flag to change this value.

GPU

GPU refers to a full graphics card, not a single GPU die on the board. You can use the --gpu flag to change this value. Currently, we only offer Nvidia K40s and Nvidia K80s. To request those cards, indicate --gpu k40 and --gpu k80 respectively.

GPU Count

GPU Count refers to the number of graphics card will be made available to you inside your container or VM. You can use the --gpuCount flag to change the number of graphics cards value for your job. We currently allow a maximum of 4 GPUs per job.

GPU Memory

GPU Memory refers to the amount (in GB) of GPU memory that will be made available to you inside your container or VM. You can use the --gpuMemory flag to change this value.

Scratch Space

Using the --scratch flag, you can request that additional space be added onto the base image. For example, the provided Lancium base image has a virtual disk size of 5GB. To get an additional 5GB of usuable scratch on the image, you can provide the --scratch 5 flag when creating the virtual machine.

Shapes

A shape is a predefined job resource configuration saved in resources/shapes.yaml. You are encouraged to modify the properties of the shapes.yaml file to best suit your needs. You can both modify existing shapes and create your own shapes. Shapes do not need to have all fields present. Any missing field will assume a default value, shown in the table below.

Job Resource Defaults

Field Default
CPUs 1 core
Memory 4096MB
GPU None
GPU Count 1 GPU
GPU Memory 12 GB
Scratch space 0 GB

GCLI Commands

clientserver

clientserver will start, stop, or restart a GenesisII clientserver on a port given via the -p flag. The clientserver must be running in order for the GCLI to work.

Note: If you decide to a port number other than 8888, you must set an environment variable LANCIUM_CLIENT_SERVER to localhost:<PORT_NUMBER>.

usage: gcli.py authenticate [-h] [-p PORT] action

positional arguments:
  action                'start', 'restart', 'stop', or 'status'.

optional arguments:
  -h, --help            show this help message and exit
  -p PORT, --port PORT  Port to run your clientserver. (Default: 8888)

startVMs

startVMs configures and submits VM jobs to the Lancium Compute backend.

usage: gcli.py startVMs [-h] [-s SHAPE] [-c CPUs] [-m MEMORY] [--gpu GPU] [--gpuCount GPUCOUNT]
                        [--gpuMemory GPUMEMORY] [--time TIME] [--startIndex STARTINDEX]
                        [--scratch SCRATCH | --tmp TMP]
                        imageName jobName count

positional arguments:
  imageName             Image of VM(s) to start
  jobName               Name of job(s)
  count                 number of VM(s) to start

optional arguments:
  -h, --help            show this help message and exit
  -s SHAPE, --shape SHAPE
                        xsmall, small, medium, or large VM configuration presets
  -c CPUs, --cpus CPUs
                        number of vcpus for VM(s). (Default: 1) Will override shape preset
  -m MEMORY, --memory MEMORY
                        memory for VM(s) in either GB or MB. You must provide a suffix. (Default: 8 GB) Will override shape preset
  --gpu GPU             GPU type: K40, K80, or GTX 1070. (Default: None) Will override shape preset
  --gpuCount GPUCOUNT   Number of GPUs to include. (Default: 0) Will override shape preset
  --gpuMemory GPUMEMORY
                        Amount of memory per GPU in GB. (Default: 12) Will override shape preset
  --time TIME           Amount of wallclock time in minutes to run your job. (Default: Run until completion
                        or until terminated by queuing system.)
  --startIndex STARTINDEX
                        Starting suffix when count > 1. Defaults to 1. (Default: 1)
  --scratch SCRATCH     Amount of additional scratch space added to base image in GB. (Default: 0 GB)
  --tmp TMP             DEPRECATED: Use --scratch. Amount of additional scratch space added to base image
                        in GB. (Default: 0 GB)

startContainers

startContainers configures and submits Singularity container jobs to the Lancium Compute backend.

usage: gcli.py startVMs [-h] [-s SHAPE] [-c CPUs] [-m MEMORY] [--gpu GPU] [--gpuCount GPUCOUNT]
                        [--gpuMemory GPUMEMORY] [--startIndex STARTINDEX] [--scratch SCRATCH | --tmp TMP]
                        [--time TIME] [-i INPUT INPUT] [-I GRIDINPUT GRIDINPUT] [--storage STORAGE STORAGE]
                        [-a ARCHIVE ARCHIVE] [-o OUTPUT OUTPUT]
                        imageName jobName count command

positional arguments:
  imageName             Image of VM(s) to start
  jobName               Name of job(s)
  count                 number of VM(s) to start
  command               Quoated string indicating what to run inside singularity container.

optional arguments:
  -h, --help            show this help message and exit
  -s SHAPE, --shape SHAPE
                        xsmall, small, medium, or large VM configuration presets
  -c CPUs, --cpus CPUs
                        number of vcpus for VM(s). (Default: 1) Will override shape preset
  -m MEMORY, --memory MEMORY
                        memory for VM(s) in either GB or MB. You must provide a suffix. (Default: 8 GB) Will override shape preset
  --gpu GPU             GPU type: K40, K80, or GTX 1070. (Default: None) Will override shape preset
  --gpuCount GPUCOUNT   Number of GPUs to include. (Default: 0) Will override shape preset
  --gpuMemory GPUMEMORY
                        Amount of memory per GPU in GB. (Default: 12) Will override shape preset
  --startIndex STARTINDEX
                        Starting suffix when count > 1. Defaults to 1. (Default: 1)
  --scratch SCRATCH     Amount of additional scratch space added to base image in GB. (Default: 0 GB)
  --tmp TMP             DEPRECATED: Use --scratch. Amount of additional scratch space added to base image
                        in GB. (Default: 0 GB)
  --time TIME           Amount of wallclock time in minutes to run your job. (Default: Run until completion
                        or until terminated by queuing system.)
  -i INPUT INPUT, --input INPUT INPUT
                        Copy a file or directory from your local file system into the job's working
                        directory. Takes 2 arguments, the first is local path (relative or full) to
                        file/directory, the second is the name you want in the job working directory.
  -I GRIDINPUT GRIDINPUT, --gridinput GRIDINPUT GRIDINPUT
                        Copy a file or directory from the grid into the job's working directory. Takes 2
                        arguments, the first is full grid path to file/directory, the second is the name
                        you want in the job working directory.
  --storage STORAGE STORAGE
                        Copy a file or directory from the storage space into the job's working directory.
                        Takes 2 arguments, the first is local path (relative or full) to file/directory,
                        the second is the name you want in the job working directory.
  -a ARCHIVE ARCHIVE, --archive ARCHIVE ARCHIVE
                        Copy an archive file (.tar, .zip, .gz) and extract into job working directory.
                        Takes two arguments, the first is the local path (relative or full) to archive
                        file, and the second is the name you want in the job working directory.
  -o OUTPUT OUTPUT, --output OUTPUT OUTPUT
                        Output a file or directory back to your local file system. Takes two arguments, the
                        first is the path in the local file system to writeback, the second is the filename
                        of the output file or directory in the job working directory.

complete

complete stages back data that was indicated in startContainers (only applies to Container jobs, not VMs) and deletes the job record.

usage: gcli.py complete [-h] tickets

positional arguments:
  tickets     Comma-seperated list of tickets, or 'all' if you want to complete all jobs

optional arguments:
  -h, --help  show this help message and exit

listVMs

listVMs outputs information about your Lancium job records.

usage: gcli.py listVMs [-h] [-a] [-d]

optional arguments:
  -h, --help    show this help message and exit
  -a, --all     Prints out all VMs included those in FINISHED state
  -d, --detail  Prints out full information about the VMs listed

vmStatus

vmStatus outputs state and other information information about you Lancium jobs indicated with a job name.

usage: gcli.py vmStatus [-h] jobList

positional arguments:
  jobList     Comma-seperated list of job names

optional arguments:
  -h, --help  show this help message and exit

vmTerminate

vmTerminate immediately halts all job execution. No data will be staged out after termination via vmTerminate.

usage: gcli.py vmTerminate [-h] ticketList

positional arguments:
  ticketList  Comma-seperated list of tickets

optional arguments:
  -h, --help  show this help message and exit

uploadImage

uploadImage is the tool used for uploading your images to the Lancium Compute backend. Note, only .simg, .sif, and .qcow2 files are accepted.

usage: gcli.py uploadImage [-h] imageName localPath

positional arguments:
  imageName   What to name the image in the grid
  localPath   Local path to .qcow2 or .simg/.sif file

optional arguments:
  -h, --help  show this help message and exit

listImages

listImages provides information about images hosted on the Lancium Compute backend, including your images and Lancium’s images.

usage: gcli.py listImages [-h] [--vm] [--singularity] [--size] [--lancium]

optional arguments:
  -h, --help     show this help message and exit
  --vm           Only list VM images
  --singularity  Only list singularity images
  --size         Include size (in bytes) of each image listed
  --lancium      List Lancium's images

rmImage

rmImage is the tool used for deleting your images from Lancium Compute backend.

usage: gcli.py rmImage [-h] imageName

positional arguments:
  imageName   Name of image to be removed

optional arguments:
  -h, --help  show this help message and exit

uploadToStorage

uploadToStorage is the tool used for uploading your data into persistent Lancium storage.

usage: gcli.py uploadToStorage [-h] [-f] localPath gridPath

positional arguments:
  localPath    Local path to file or directory to upload
  gridPath     Path in grid relative to STORAGE directory

optional arguments:
  -h, --help   show this help message and exit
  -f, --force  Upload a file even if it overwrites an existing file.

listStorage

listStorage provides information about your data uploaded to Lancium’s persistent storage backend.

usage: gcli.py listStorage [-h] [-p PATH]

optional arguments:
  -h, --help            show this help message and exit
  -p PATH, --path PATH  Path in grid relative to STORAGE directory

rmStorage

rmStorage is the tool used for removing your data from persistent Lancium storage.

usage: gcli.py rmStorage [-h] path

positional arguments:
  path        Path in grid relative to STORAGE directory

optional arguments:
  -h, --help  show this help message and exit

authenticate

authenticate is the tool used for authenticating with the GCLI.

usage: gcli.py authenticate [-h] username password

positional arguments:
  username    Lancium account username
  password    Lancium account password

optional arguments:
  -h, --help  show this help message and exit

whoami

whoami provides information about the credentials with which you’re logged in. It is useful for verifying successful authentication.

usage: gcli.py whoami 

logout

logout will log you out of the GCLI. You will be unable to use the GCLI until you have reauthenticated.

usage: gcli.py logout 

version

version will provide information regarding the version of the GCLI that is installed.

usage: gcli.py version 

VM Base Image

To create a virtual machine, you must use or modify a provided base image. Currently we provide a Ubuntu Server 18.04 LTS base image. This image is modified to have a user account lancium with password fcz83#FZ\%rsQcVNe and is configured to interact with our backend. From this image, you can modify it as required with local tools such as virt-install or Virtual Machine Manager. Once you are done modifying this base image, you will upload it to be used through the CLI described below.

WARNING: Using another base image will lead to issues such as failure to report an IP address, partitions not being resized, and other issues.

GCLI Walkthrough

Before using the GCLI, you must have a clientserver running as described above. Then, you must use authenticate with the Lancium backend using the authenticate command before using most of the commands.

gcli authenticate tester@lancium.com testPass

Now, you will remain authenticated as long as the clientserver remains running. To check who you’re currently authenticated as run gcli whoami which will output something similar to:

Client Tool Identity: 
(CONNECTION) "Client Cert F9535C75-2D19-751A-F28E-720B8FB4CF87"
Additional Credentials: 
(USER) "tester@lancium.com" -> (CONNECTION) "Client Cert F9535C75-2D19-751A-F28E-720B8FB4CF87"
(GROUP) "Lancium-users" -> (CONNECTION) "Client Cert F9535C75-2D19-751A-F28E-720B8FB4CF87"

Now, we want to upload the custom image that you’ve built.

Image Management

Before running a VM or Singularity job, an image is required. This image must exist in the grid. The image you provide acts primarly as a virtual environment for your running job. You are encouraged to build your own image(s) made specifically for the kinds of jobs you would like to run.. In addition, we have our own collection of prebuilt images that any Lancium user is free to use.

In the following sections, we discuss how to manage your images in the grid.

Upload

To upload an image to the grid, use the uploadImage command. uploadImage takes two arguments: the path to your image in your local filesystem, and the name you would like in the grid. It’s important that all uploaded images can only have the extensions .simg, .sif, or .qcow2. The image in your local file path and the name in the grid must have one of the three allowed extensions and the extensions must match. If the extensions do not match or you are attempting to upload a file with a disallowed file extension, your upload will fail.

Example:

gcli uploadImage amber18-cuda9.2.simg my_amber.simg

Will output:

Exit Code: 0

Note: uploadImage will initiate the image upload. A successful exit does not mean that image has finished uploading, only that it has started.

List

To see what images already exist in the grid, you can use gcli listImages.

Example:

gcli listImages

Will output:

L_amber18-cuda9.2.simg
amber18-cuda9.2.simg
test.qcow2
test.simg

You can choose to only show VM images or Singularity images by using the –vm and –singularity flags respectively.

Example:

gcli listImages --singularity

Will output:

L_amber18-cuda9.2.simg
amber18-cuda9.2.simg
test.simg

Additionally, you can see the size of each image in bytes with the –size flag.

gcli listImages --size

Will output:

L_amber18-cuda9.2.simg 5073121280
amber18-cuda9.2.simg 5105672223
test.qcow2 1902641152
test.simg 3282227200

Note: You should verify that these file sizes match with the local copy before attempting to boot the image. You can check the size of your image in your local filesystem by running:

du -b your_image.qcow2

You can list Lancium’s images using the –lancium flag.

gcli listImages --lancium

Will output:

L_amber18-cuda9.2.simg
QuantumEspresso.simg
gromacs.simg
lancium-gpu-18.04.qcow2
lancium-ubuntu-18.04.qcow2
py2_caffe.simg
py2_pytorch.simg
py2_tensorflow.simg
py2_theano.simg
py3_caffe.simg
py3_pytorch.simg
py3_tensorflow.simg
ubuntu.simg
ubuntu18.04.simg
ubuntu18.04_cuda9.2.simg

Delete

To delete an image you can run gcli rmImage. rmImage takes only one argument, the name of the image in the grid you wish to delete.

WARNING: Image deletion is irreversible.

Example:

gcli rmImage your_image.qcow2

Will output:

Exit Code: 0

Job Management

Once the image you want to use has been fully uploaded, you can start it.

gcli startVMs image_name.qcow2 VM_NAME 1 -c 2 -m 8092MB

This will start the image with hostname VM_NAME, 2 vCPUS, and 8092MB of memory. Note that we get the VM ticket as output of the startVMs command. This ticket is used to control the VM instance. To check on the status of this image, we can run the command: gcli listVMs -d (-d asks for detailed information on the VMs). This may output:

{
    "VM_NAME": {
        "ticket": "3A89D01C-29CB-53C7-4EF9-701E3CD82DFE",
        "time": "15:31 EDT 12 May 2020",
        "tries": "0",
        "state": "Booting",
        "ipaddr": "Booting"
    }
}

If we wait for the VM to fully boot, we will get something like:

{
    "VM_NAME": {
        "ticket": "3A89D01C-29CB-53C7-4EF9-701E3CD82DFE",
        "time": "15:31 EDT 12 May 2020",
        "tries": "0",
        "state": "Running",
        "ipaddr": "10.3.250.125"
    }
}

From this output, we get an IP address that we can SSH onto from other VMs. Currently, this IP address is non-routable and can only be used for intra-Lancium communication. Once you are done using the VM, you can shut it down and the instance will clean up behind it. Otherwise, you can ask the VM to terminate with its ticket:

gcli vmTerminate 3A89D01C-29CB-53C7-4EF9-701E3CD82DFE

Some other helpful examples:

gcli vmStatus VM_NAME

This will print the same output as listVMs -d except only for the VM named.

gcli rmImage image_name.qcow2

This will delete the named VM image on the Lancium backend.

gcli logout

This will logout of all authenticated users with the clientserver.

gcli version

This will print out the current version of the GCLI package.

Singularity™ Jobs

To start a singularity job, we require the –singularity flag to be set. This flag takes a quoted command-line argument that we will run inside the container. Example:

gcli startVMs image.simg new_job 1 -i /home/user/input.txt input.txt --singularity "cat input.txt"

Certain characters are not allowed to appear in the –singularity argument. These characters are “, ‘, and ;.

Input to Container Jobs

WARNING: Uploading large files as input to your job or into storage can take a long time, often giving the impression that the GCLI is hanging. Unless you get a message indicating an error has occurred, your upload is proceeding normally and should not be interrupted.

There are multiple ways to get files and directories into your job working directory: input from your local file system, archives, and from storage.

To input files from your local file system, use the -i or –input flag when running startVMs.

The -i flag takes two inputs, the relative or absolute path to the file or directory and the name in the job working directory.

Example:

gcli startVMs image.qcow2 new_job 1 -i /home/user/input.txt input.txt
gcli startVMs ... -i ../relative/data_sets data

To input archive files from your local file system, use the -a or –archive flag when running startVMs.

The -a flag takes two inputs, the relative or absolute path to the file or directory and the name in the job working directory. Files uploaded with this flag will extracted in the job working directory automatically.

Example:

gcli startVMs image.simg new_job 1 -a ../path/to/archive.tar archive.tar --singularity "ls"

To input files from your grid storage, use the –storage flag when running startVMs.

The –storage flag takes two inputs, the relative path to the file or directory in the grid. The base directory is /home/CCC/Lancium/<USERNAME>/STORAGE/ and the name in the job working directory.

Example:

gcli startVMs ... --storage path/in/grid/input.txt input.txt

But how do I get my large data sets into my storage directory? Please see the “Storage” section.

Output from Container Jobs

Output is specified during submission, not completion. You specify output with the -o or –output flag. The -o flag takes two arguments: the local path where to stageout data, and the file or directory name in the job working directory that will be staged out.

NOTE: If the local path that you’re trying to stage data out to does not exist during submission, you will only get a warning and the job will still be submitted (assuming there are no other faults).

Standard output will always be called stdout.txt Standard error will always be called stderr.txt

If you want to stageout standard output and standard error:

gcli ... -o ../path/to/save/stdout.txt stdout.txt 
gcli ... -o ../path/to/save/different_std_err_name.txt stderr.txt 

Additional file and directory examples:

gcli ... -o ../path/to/save/new_file.txt file_in_JWD.txt 
gcli ... -o ../path/to/save/directory directory_in_JWD

Now that output locations have been specified, you must wait for the job to complete before staging out any data. You can terminate your job early using:

gcli vmTerminate <ticket>

When your job has completed, you can stageout the data and clean up the job using complete:

gcli complete <JOB_ID>

complete will download data from the JWD according to what was specified at submission.

If the directory structure required to stageout files does not exist in your local file system, you will get an error and job completion will terminate.

Ex: If you’re trying to stageout stdout.txt to /path/to/stdout.txt, and /path/to does not exist, stdout.txt will not be staged out.

If files you specified at submission time do not exist in the JWD when running complete, they will be skipped.

If all files are stageout or skipped, the job will be cleaned up.

Storage

Files can be uploaded to persistent storage location in the grid to be used in job execution. To upload to storage you can use the uploadToStorage, which takes two arguments: the path to the file or directory in your local file system, and the relative path in the grid that you’d like to keep the file. If the relative path does not exist, we will create it.

For example:

gcli uploadToStorage ../local_path_to/data_set_directory data_sets/data_set_1

To use files uploaded to storage in your jobs, please see section “Input to Container Jobs”.

Quick Start Demo: Containers

To run a simple ‘hello world’ Singularity job, with output staged back:

$ gcli startContainers Lancium/ubuntu18.04.simg MyJob 1 "echo hello world" -o out stdout.txt

Will output:

{
    "MyJob": {
        "ticket": "F52E470E-DE19-3F4A-778D-D88156EE136D"
    }
}

To check the status of your job:

$ gcli vmStatus MyJob

Will output:

{
    "MyJob": {
        "ticket": "F52E470E-DE19-3F4A-778D-D88156EE136D",
        "time": "14:00 EDT 15 Jun 2020",
        "tries": "0",
        "state": "Running",
        "ipaddr": "None, IPs only assigned for VM jobs"
    }
}

After it has terminated, to get your output back:

$ gcli complete F52E470E-DE19-3F4A-778D-D88156EE136D

Will output:

INFO: Copied output file/directory: stdout.txt to location: /home/charlie/amber_test/out.
Exit Code: 0

To verify your output file contains what you expect:

$ cat out
hello world