OpenPose Container for Nvidia Ampere Series

Our previous OpenPose multi-container provides support for multiple architectures up to Nvidia’s Turing architecture (see here for details on the different generations). However, it fails on the Ampere generation of cards (specifically the A100 and A40 cards offered at FAU’s new national supercomputer) because these require CUDA 11.1 or higher. We cannot use the previous recipe either, because Nvidia stopped updating the NVCaffe containers, and the last one still runs with CUDA 10.2. Also, installing NVCaffe from GitHub is next to impossible – the dependencies have evolved too far already. Even plain Caffe without any modifications does not play nicely with current versions of OpenCV because of renamed function calls. We will thus follow OpenPose’s installation instructions and modify them when necessary. We’ll build the container directly on the target system, a node with an A40 card.

Preparatory steps:

Ask HPC support for singularity fakeroot support. At FAU this is disabled by default for security reasons, but can be enabled temporarily by request.
Reserve a node with the target architecture for 6 hours, to be on the safe side.
Find a local disk on the node (/scratch or /tmp at FAU) because NFS mounts will throw “Permission denied” errors in conjunction with the fakeroot feature.

At the time of writing this, CUDA 11.6.0 is the latest available version. However, the latest version with cuDNN on DockerHub is 11.5.1, which we will be using here. You can check the latest versions here.
Note: It is possible to build OpenPose without cuDNN, and on some architectures and with some models, this may be faster than the version with cuDNN. If you decide to do that, use another image in this step and use the alternative build command below.

Initialize container

export http_proxy="proxy.rrze.uni-erlangen.de:80"
export https_proxy="proxy.rrze.uni-erlangen.de:80"
singularity build --sandbox openpose_container_feb22_v4 docker://nvidia/cuda:11.5.1-cudnn8-devel-ubuntu20.04

Open writable shell as root and get set up:

singularity shell --nv -f -w openpose_container_feb22_v4
export TERM="vt100"
export HTTP_PROXY=http://proxy.rrze.uni-erlangen.de:80
export HTTPS_PROXY=http://proxy.rrze.uni-erlangen.de:80
export http_proxy=http://proxy.rrze.uni-erlangen.de:80
export https_proxy=http://proxy.rrze.uni-erlangen.de:80
export LC_ALL=C

Update: Tobias van Valkenhoef has correctly pointed out that –nv and -w do not combine. IIRC I removed the –nv part, because I did not need to mount the Nvidia drivers and CUDA stuff from the host, which is already included in the container provided by Nvidia. If you try this out, please drop me a line!

Let us upgrade the OS in the container:

apt-get -y --no-install-recommends update && \
apt-get -y --no-install-recommends upgrade

Install dependencies

apt install -y --no-install-recommends \
build-essential \
cmake \
git \
wget \
nano \
dialog \
ffmpeg \
software-properties-common \
libatlas-base-dev \
libleveldb-dev \
libsnappy-dev \
libhdf5-serial-dev \
libboost-all-dev \
libgflags-dev \
libgoogle-glog-dev \
liblmdb-dev \
pciutils \
python3-setuptools \
python3-dev \
python3-pip \
opencl-headers \
ocl-icd-opencl-dev \
libviennacl-dev \
libavcodec-dev \
libavformat-dev \
libswscale-dev \
libv4l-dev \
libxvidcore-dev \
libx264-dev \
libgtk-3-dev \
gfortran \
sudo \
pkg-config \
libcanberra-gtk-module \
libopencv-dev && \
python3 -m pip install \
numpy \
opencv-python
pip3 install protobuf

Clone OpenPose, make a copy, and install remaining dependencies

cd /opt
git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git
cp -R openpose openpose_cpu
cd openpose
bash ./scripts/ubuntu/install_deps.sh
git submodule update --init --recursive --remote

Optional: Download models manually (if there is no Internet connection at build time; also copy or link to the CPU directory)

wget -P /opt/openpose/models/pose/body_25/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/body_25/pose_iter_584000.caffemodel
wget -P models/pose/coco/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/coco/pose_iter_440000.caffemodel
wget -P models/pose/mpi/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/mpi/pose_iter_160000.caffemodel
wget -P models/face/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/face/pose_iter_116000.caffemodel
wget -P models/hand/ http://posefs1.perception.cs.cmu.edu/OpenPose/models/hand/pose_iter_102000.caffemodel

Build for all GPU Architectures

mkdir -p /opt/openpose/build && \
cd /opt/openpose/build && \
cmake -DCUDA_ARCH=All .. && \
make -j

Alternatively without cuDNN:

mkdir -p /opt/openpose/build && \
cd /opt/openpose/build && \
cmake -DCUDA_ARCH=All -DUSE_CUDNN=OFF .. && \
make -j

Build for CPU – Note that we are building on AMD, so there is no MKL support compiled into the binaries.

mkdir -p /opt/openpose_cpu/build && \
cd /opt/openpose_cpu/build && \
cmake -DGPU_MODE=CPU_ONLY .. && \
make -j

Make this the content of the file /.singularity.d/runscript:

#!/bin/bash
if nvidia-smi; then
    cd /opt/openpose
    echo "#### USING GPU ####"
else
    cd /opt/openpose_cpu
    echo "#### USING CPU ####"
fi
./build/examples/openpose/openpose.bin "$@"

Make sure the runscript is executable:
```
chmod a+rx /.singularity.d/runscript
```
Wrap up – I encountered a problem in building the final sif file due to some “permission denied” error. The first line resolved the problem (ignore the error messages on what was not accessible – at least at FAU some things are mounted into /var from the outside).
```
chmod -R a+rwX /var
exit
singularity build ~/titan/openpose/openpose_multi_container_feb22_v4.sif openpose_container_feb22_v4/
```

Special thanks go to Thomas Zeiser and his colleagues at FAU’s National Supercomputer Centre for their patient expert help, and to KONWIHR for funding!

Please leave any feedback in the comments.

2 Replies to “OpenPose Container for Nvidia Ampere Series”

kthun91 says:

February 17, 2023 at 6:28 pm

I tried exactly what you did and I could not get it to work.
DNN_STATUS_SUCCESS (1 vs. 0) CUDNN_STATUS_NOT_INITIALIZED

I am out of ideas now.

1. kthun91 says:
  
  February 19, 2023 at 10:06 pm
  
  I got it to work.
  What helped me was to compile it on the target system within a fakeroot environment. Also I should mention that it worked with singularity 3.9.5, but not with apptainer 1.0.
  My system: CUDA 12, debian 11 and GPU is A100.

2 Replies to “OpenPose Container for Nvidia Ampere Series”

Leave a Reply Cancel reply