OpenPose with NVCaffe in a Singularity container with support for multiple architectures

Update February 2022: Neither this nor Frankie Robertson’s Singularity container work with Nvidia Ampere series cards. For these, there is a new blog post.
Update November 2020: Frankie Robertson’s Singularity image and recipe (see comments below) provide the latest OpenPose and Caffe versions.

At Red Hen, we have started to use OpenPose for gesture recognition purposes. In order to ensure portability across the various HPC facilities offered by the universities of the researchers involved I chose to create a Singularity container. Having this container even helps with the various GPU generations available within FAU’s TinyGPU cluster. Here are the benefits of using the image the creation of which I will describe below:

The image

contains the latest OpenPose codebase
is based on the latest NVCaffe image (for newer NVidia cards)
is compatible with all generations of NVidia cards available to us
includes a CPU version and will automatically use that if no GPU is available

In my tests, the major advantage of the NVCaffe image was the smaller memory footprint (less than 3 GB vs. roughly 5 GB with the custom Caffe that comes with OpenPose by default), which means that the GPUs with 4 GB of RAM become usable for OpenPose. In addition, it was about 10% faster than the Caffe that comes with OpenPose on GTX 1080 cards.

Note that the instructions below assume interactive installation due to some glitches that occured in my setup. Feel free to create a Singularity recipe out of this and post the link in the comments! Also, this should be applicable to use in Docker with minimal changes.

Pull the NVCaffe image. We use a sandbox (i.e. directory) format because of the glitch described in 4. below.
```
singularity build --sandbox openpose_multi_container_oct_2019/ docker:nvcr.io/nvidia/caffe:19.09-py2
```

Open writable shell as root:

sudo singularity shell -w openpose_multi_container_oct_2019

Let us upgrade OS in the container:

export LC_ALL=C
apt-get -y --no-install-recommends update && \
apt-get -y --no-install-recommends upgrade

I was greeted with the following error: (Skip if everything runs fine)

dpkg: error processing archive /tmp/apt-dpkg-install-XOAywY/02-tzdata_2019c-0ubuntu0.18.04_all.deb (--unpack):
 unable to make backup link of './usr/share/zoneinfo/UCT' before installing new version: Invalid cross-device link

I did not find an actual solution to this problem (again, let me know in the comments if you have one), so we are going to do a simple workaround. We exit Singularity, rename the file in the container directory, go back into the container and try again.

exit
mv openpose_multi_container_oct_2019/usr/share/zoneinfo/UCT openpose_multi_container_oct_2019/usr/share/zoneinfo/UCT_original
sudo singularity shell -w openpose_multi_container_oct_2019
export LC_ALL=C
apt-get -y --no-install-recommends upgrade

You will probably be prompted for timezone information.

Let us install a range of dependencies and tools:

apt-get install -y --no-install-recommends \
build-essential \
cmake \
git \
wget \
nano \
dialog \
software-properties-common \
libatlas-base-dev \
libleveldb-dev \
libsnappy-dev \
libhdf5-serial-dev \
libboost-all-dev \
libgflags-dev \
libgoogle-glog-dev \
liblmdb-dev \
pciutils \
python3-setuptools \
python3-dev \
python3-pip \
opencl-headers \
ocl-icd-opencl-dev \
libviennacl-dev \
libavcodec-dev \
libavformat-dev \
libswscale-dev \
libv4l-dev \
libxvidcore-dev \
libx264-dev \
libgtk-3-dev \
gfortran \
pkg-config \
libcanberra-gtk-module && \
python3 -m pip install \
numpy \
opencv-python
pip3 install protobuf==3.6.0
add-apt-repository -y ppa:jonathonf/ffmpeg-4
apt-get -y --no-install-recommends update
apt-get -y install ffmpeg

The NVCaffe image is strange in that it does not contain some pieces of software against which NVCaffe appears to be built. Protobuf and OpenCV are not installed, but the build fails without them. However, NVCaffe seems to expect them to be in a version that is NOT part of the OS’s package management. Ubuntu 18.04 currently contains Protobuf 3.0 and OpenCV 3.2, but the build fails if those are installed via apt, complaining about version mismatches. We thus need to install OpenCV 3.4 and Protobuf 3.6.0 specifically, although for the latter the pip3 line in the previous step should be enough.

OpenCV with CUDA support and Fast Math (not that the latter seems to change much in our case…):

cd /opt
wget -O opencv3.4.8.zip https://github.com/opencv/opencv/archive/3.4.8.zip
wget -O opencv-contrib3.4.8.zip https://github.com/opencv/opencv_contrib/archive/3.4.8.zip
unzip opencv3.4.8.zip
unzip opencv-contrib3.4.8.zip
cd opencv-3.4.8/
mkdir build && cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE \
    -D CMAKE_INSTALL_PREFIX=/usr/local \
    -D WITH_CUDA=ON \
    -D ENABLE_FAST_MATH=1 \
    -D CUDA_FAST_MATH=1 \
    -D WITH_CUBLAS=1 \
    -D WITH_FFMPEG=ON \
    -D INSTALL_PYTHON_EXAMPLES=ON \
    -D OPENCV_EXTRA_MODULES_PATH=/opt/opencv_contrib-3.4.8/modules \
    -D OPENCV_ENABLE_NONFREE=ON \
    -D BUILD_EXAMPLES=ON ..
make -j`nproc`
make install
cd /opt
rm opencv3.4.8.zip
rm opencv-contrib3.4.8.zip

Clone OpenPose and make a copy for the CPU version

cd /opt
git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git
cp -R openpose openpose_cpu

In our environment, I have root access only to a machine that does not have a GPU. But the compilation for the NVCaffe version appears to require the presence of a GPU. Thus I need to move the image for compilation to a machine in the HPC cluster that does not have direct Internet access. Skip this step if you have root access to a GPU machine (but make sure you called singularity with --nv and loaded CUDA before).

chmod -R a+rwX /opt/openpose
chmod -R a+rwX /opt/openpose_cpu
chmod a+rwX /opt
exit
sudo tar cvzf openpose_multi_container_oct_2019.tar.gz openpose_multi_container_oct_2019/
# copy to remote machine, SSH there and execute
tar xvzf openpose_multi_container_oct_2019.tar.gz
# You may need to load the correct NVidia drivers here. In our case: module load cuda/10.1
singularity shell --nv -w openpose_multi_container_oct_2019
export LC_ALL=C
#optional: set http proxy
export HTTP_PROXY=http://proxy.rrze.uni-erlangen.de:80
export HTTPS_PROXY=https://proxy.rrze.uni-erlangen.de:443
export http_proxy=http://proxy.rrze.uni-erlangen.de:80
export https_proxy=https://proxy.rrze.uni-erlangen.de:443

Modify /opt/openpose/CMakeLists.txt; the line removes a build error that occured for me.

# Add this line (I put it in line 226):
find_package(Boost COMPONENTS system filesystem REQUIRED)

Build OpenPose for GPU, enabling all GPU architectures (but see step 17 for old cards):

mkdir -p /opt/openpose/build && \
cd /opt/openpose/build && \
cmake -DDL_FRAMEWORK=NV_CAFFE -DCaffe_INCLUDE_DIRS=/usr/local/lib/include/caffe \
  -DCaffe_LIBS=/usr/local/lib/libcaffe-nv.so -DBUILD_CAFFE=OFF -DCUDA_ARCH=All .. && \
make -j`nproc`

There is a problem with the CPU version of Caffe used by OpenPose. We need to make some minor changes before it will compile in this container. If needed, you can download the COCO model that is faster with the CPU version.

# Edit /opt/openpose_cpu/3rdparty/caffe/src/caffe/layers/mkldnn_inner_product_layer.cpp
# Add 3 spaces to the beginning of lines 354 and 357
# Add 4 spaces to the beginning of lines 355 and 358

mkdir -p /opt/openpose_cpu/build && \
cd /opt/openpose_cpu/build && \
cmake -DGPU_MODE=CPU_ONLY .. && \
make -j`nproc`
# The following step is optional, in case you want to use the COCO model (faster on CPUs but less accurate)
cd /opt/openpose/models
wget -c http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/coco/pose_iter_440000.caffemodel -P pose/coco/

Make this the content of the file /.singularity.d/runscript:

#!/bin/bash
if nvidia-smi; then
    cd /opt/openpose
    echo "#### USING GPU ####"
else
    cd /opt/openpose_cpu
    echo "#### USING CPU ####"
fi
./build/examples/openpose/openpose.bin "$@"

Make sure the runscript is executable and exit:
```
chmod a+rx /.singularity.d/runscript
exit
```
Convert the sandbox directory to a compressed read-only Singularity Image File (SIF) for production use:
```
singularity build openpose_multi_container_oct_2019.sif openpose_multi_container_oct_2019/
```
You can now delete the sandbox directory if you like.
To run OpenPose, you can now simply run the singularity container with the OpenPose options. In our example below, we run on a GPU (hence --nv) and just want JSON output for body pose, face and hand.
```
singularity run --nv ~/openpose/openpose_multi_container_oct_2019.sif --video ~/tmp/whatever.mp4 --write_json ~/wherever --display 0 --render_pose 0 --face --hand
```
If you have legacy cards, NVCaffe is not for you. Officially, only the Pascal, Volta and Turing architectures are supported. Details about card generations can be found here. However, in my tests, the GTX 980 from the Maxwell generation also worked with the NVCaffe build of OpenPose described above. The Kepler cards (K20m, K40m) did not. For those, we will install OpenPose with the Custom Caffe that comes with it by default. Unfortunately, we need a different version of cmake than the one that ships with Ubuntu 18.04 due to some incompatiblity.
```
cd /opt
wget https://github.com/Kitware/CMake/releases/download/v3.15.4/cmake-3.15.4-Linux-x86_64.sh
/bin/sh cmake-3.15.4-Linux-x86_64.sh
# [say yes to everything]
git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git openpose_legacy_gpu
cd /opt/openpose_legacy_gpu
rm -rf /opt/openpose_legacy_gpu/build
mkdir -p /opt/openpose_legacy_gpu/build && \
cd /opt/openpose_legacy_gpu/build && \
/opt/cmake-3.15.4-Linux-x86_64/bin/cmake -DCUDA_ARCH=All .. && \
make -j`nproc`
```
Note: It looks like custom Caffe does not produce code for all architectures needed. OpenPose claims to be preparing code for sm_30 to sm_75 when -DCUDA_Arch=All is set, the “Caffe Configuration Summary” that shows during build does not include sm_75. We do not care at the moment, because anything from sm50 onwards should run with NVCaffe.
We also have to adapt the runscript to select the appropriate version of OpenPose:
```
#!/bin/bash
if nvidia-smi; then
    if (($(deviceQuery | grep "CUDA Capability" | grep -oP "(?<= )[0-9]" | head -n 1) >= 5 )); then
        cd /opt/openpose
        echo "#### USING GPU with NVCaffe ####"
    else
        cd /opt/openpose_legacy_gpu
        echo "#### USING Legacy GPU with Custom Caffe ####"
    fi
else
    cd /opt/openpose_cpu
    echo "#### USING CPU ####"
fi
./build/examples/openpose/openpose.bin "$@"
```

Special thanks go to Thomas Zeiser, Georg Hager and all others at FAU’s HPC Support team for their patient expert help, and to KONWIHR for funding.

Please leave any feedback in the comments.

3 Replies to “OpenPose with NVCaffe in a Singularity container with support for multiple architectures”

Christoph Stenkamp says:

September 28, 2020 at 12:21 pm

before installing opencv in step 5, it seems necessary to update pip: “pip3 install –upgrade pip”, see https://stackoverflow.com/a/63457606/5122790

1. pruhrig says:
  
  October 16, 2020 at 4:19 pm
  
  Thank you for this update! Yes, the pip version (v9) shipped with Ubuntu 16.04 LTS is seriously outdated now and has stopped working with newer packages.
  
Frankie Robertson says:

October 28, 2020 at 1:09 pm

I managed to get this building on Singularity Hub at the beginning of my GSOC. I’m continuing to update it and will probably update the base image and build NVCaffe manually soon since the base is no longer being updated.

https://github.com/frankier/openpose_containers

PRs welcome!

3 Replies to “OpenPose with NVCaffe in a Singularity container with support for multiple architectures”

Leave a Reply Cancel reply