OpenPose Container for Nvidia Ampere Series

Our previous OpenPose multi-container provides support for multiple architectures up to Nvidia’s Turing architecture (see here for details on the different generations). However, it fails on the Ampere generation of cards (specifically the A100 and A40 cards offered at FAU’s new national supercomputer) because these require CUDA 11.1 or higher. We cannot use the previous recipe either, because Nvidia stopped updating the NVCaffe containers, and the last one still runs with CUDA 10.2. Also, installing NVCaffe from GitHub is next to impossible – the dependencies have evolved too far already. Even plain Caffe without any modifications does not play nicely with current versions of OpenCV because of renamed function calls. We will thus follow OpenPose’s installation instructions and modify them when necessary. We’ll build the container directly on the target system, a node with an A40 card.

Preparatory steps:

  • Ask HPC support for singularity fakeroot support. At FAU this is disabled by default for security reasons, but can be enabled temporarily by request.
  • Reserve a node with the target architecture for 6 hours, to be on the safe side.
  • Find a local disk on the node (/scratch or /tmp at FAU) because NFS mounts will throw “Permission denied” errors in conjunction with the fakeroot feature.

At the time of writing this, CUDA 11.6.0 is the latest available version. However, the latest version with cuDNN on DockerHub is 11.5.1, which we will be using here. You can check the latest versions here.
Note: It is possible to build OpenPose without cuDNN, and on some architectures and with some models, this may be faster than the version with cuDNN. If you decide to do that, use another image in this step and use the alternative build command below.

  1. Initialize container
    export http_proxy="proxy.rrze.uni-erlangen.de:80"
    export https_proxy="proxy.rrze.uni-erlangen.de:80"
    singularity build --sandbox openpose_container_feb22_v4 docker://nvidia/cuda:11.5.1-cudnn8-devel-ubuntu20.04
    
  2. Open writable shell as root and get set up:
    singularity shell --nv -f -w openpose_container_feb22_v4
    export TERM="vt100"
    export HTTP_PROXY=http://proxy.rrze.uni-erlangen.de:80
    export HTTPS_PROXY=http://proxy.rrze.uni-erlangen.de:80
    export http_proxy=http://proxy.rrze.uni-erlangen.de:80
    export https_proxy=http://proxy.rrze.uni-erlangen.de:80
    export LC_ALL=C
  3. Update: Tobias van Valkenhoef has correctly pointed out that –nv and -w do not combine. IIRC I removed the –nv part, because I did not need to mount the Nvidia drivers and CUDA stuff from the host, which is already included in the container provided by Nvidia. If you try this out, please drop me a line!

  4. Let us upgrade the OS in the container:
    apt-get -y --no-install-recommends update && \
    apt-get -y --no-install-recommends upgrade
  5. Install dependencies
    apt install -y --no-install-recommends \
    build-essential \
    cmake \
    git \
    wget \
    nano \
    dialog \
    ffmpeg \
    software-properties-common \
    libatlas-base-dev \
    libleveldb-dev \
    libsnappy-dev \
    libhdf5-serial-dev \
    libboost-all-dev \
    libgflags-dev \
    libgoogle-glog-dev \
    liblmdb-dev \
    pciutils \
    python3-setuptools \
    python3-dev \
    python3-pip \
    opencl-headers \
    ocl-icd-opencl-dev \
    libviennacl-dev \
    libavcodec-dev \
    libavformat-dev \
    libswscale-dev \
    libv4l-dev \
    libxvidcore-dev \
    libx264-dev \
    libgtk-3-dev \
    gfortran \
    sudo \
    pkg-config \
    libcanberra-gtk-module \
    libopencv-dev && \
    python3 -m pip install \
    numpy \
    opencv-python
    pip3 install protobuf
  6. Clone OpenPose, make a copy, and install remaining dependencies
    cd /opt
    git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git
    cp -R openpose openpose_cpu
    cd openpose
    bash ./scripts/ubuntu/install_deps.sh
    git submodule update --init --recursive --remote
  7. Optional: Download models manually (if there is no Internet connection at build time; also copy or link to the CPU directory)
    wget -P /opt/openpose/models/pose/body_25/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/body_25/pose_iter_584000.caffemodel
    wget -P models/pose/coco/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/coco/pose_iter_440000.caffemodel
    wget -P models/pose/mpi/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/mpi/pose_iter_160000.caffemodel
    wget -P models/face/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/face/pose_iter_116000.caffemodel
    wget -P models/hand/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/hand/pose_iter_102000.caffemodel
  8. Build for all GPU Architectures
    mkdir -p /opt/openpose/build && \
    cd /opt/openpose/build && \
    cmake -DCUDA_ARCH=All .. && \
    make -j

    Alternatively without cuDNN:

    mkdir -p /opt/openpose/build && \
    cd /opt/openpose/build && \
    cmake -DCUDA_ARCH=All -DUSE_CUDNN=OFF .. && \
    make -j
  9. Build for CPU – Note that we are building on AMD, so there is no MKL support compiled into the binaries.
    mkdir -p /opt/openpose_cpu/build && \
    cd /opt/openpose_cpu/build && \
    cmake -DGPU_MODE=CPU_ONLY .. && \
    make -j
  10. Make this the content of the file /.singularity.d/runscript:
    #!/bin/bash
    if nvidia-smi; then
        cd /opt/openpose
        echo "#### USING GPU ####"
    else
        cd /opt/openpose_cpu
        echo "#### USING CPU ####"
    fi
    ./build/examples/openpose/openpose.bin "$@"
    
  11. Make sure the runscript is executable:
    chmod a+rx /.singularity.d/runscript
  12. Wrap up – I encountered a problem in building the final sif file due to some “permission denied” error. The first line resolved the problem (ignore the error messages on what was not accessible – at least at FAU some things are mounted into /var from the outside).
    chmod -R a+rwX /var
    exit
    singularity build ~/titan/openpose/openpose_multi_container_feb22_v4.sif openpose_container_feb22_v4/

Special thanks go to Thomas Zeiser and his colleagues at FAU’s National Supercomputer Centre for their patient expert help, and to KONWIHR for funding!

Please leave any feedback in the comments.

2 Replies to “OpenPose Container for Nvidia Ampere Series”

  1. I tried exactly what you did and I could not get it to work.
    DNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED

    I am out of ideas now.

    1. I got it to work.
      What helped me was to compile it on the target system within a fakeroot environment. Also I should mention that it worked with singularity 3.9.5, but not with apptainer 1.0.
      My system: CUDA 12, debian 11 and GPU is A100.

Leave a Reply

Your email address will not be published. Required fields are marked *