OpenPose with NVCaffe in a Singularity container with support for multiple architectures

At Red Hen, we have started to use OpenPose for gesture recognition purposes. In order to ensure portability across the various HPC facilities offered by the universities of the researchers involved I chose to create a Singularity container. Having this container even helps with the various GPU generations available within FAU’s TinyGPU cluster. Here are the benefits of using the image the creation of which I will describe below:

The image

  • contains the latest OpenPose codebase
  • is based on the latest NVCaffe image (for newer NVidia cards)
  • is compatible with all generations of NVidia cards available to us
  • includes a CPU version and will automatically use that if no GPU is available

In my tests, the major advantage of the NVCaffe image was the smaller memory footprint (less than 3 GB vs. roughly 5 GB with the custom Caffe that comes with OpenPose by default), which means that the GPUs with 4 GB of RAM become usable for OpenPose. In addition, it was about 10% faster than the Caffe that comes with OpenPose on GTX 1080 cards.

Note that the instructions below assume interactive installation due to some glitches that occured in my setup. Feel free to create a Singularity recipe out of this and post the link in the comments! Also, this should be applicable to use in Docker with minimal changes.

  1. Pull the NVCaffe image. We use a sandbox (i.e. directory) format because of the glitch described in 4. below.
    singularity build --sandbox openpose_multi_container_oct_2019/ docker:nvcr.io/nvidia/caffe:19.09-py2
  2. Open writable shell as root:
    sudo singularity shell -w openpose_multi_container_oct_2019
  3. Let us upgrade OS in the container:
    export LC_ALL=C
    apt-get -y --no-install-recommends update && \
    apt-get -y --no-install-recommends upgrade
  4. I was greeted with the following error: (Skip if everything runs fine)
    dpkg: error processing archive /tmp/apt-dpkg-install-XOAywY/02-tzdata_2019c-0ubuntu0.18.04_all.deb (--unpack):
     unable to make backup link of './usr/share/zoneinfo/UCT' before installing new version: Invalid cross-device link

    I did not find an actual solution to this problem (again, let me know in the comments if you have one), so we are going to do a simple workaround. We exit Singularity, rename the file in the container directory, go back into the container and try again.

    exit
    mv openpose_multi_container_oct_2019/usr/share/zoneinfo/UCT openpose_multi_container_oct_2019/usr/share/zoneinfo/UCT_original
    sudo singularity shell -w openpose_multi_container_oct_2019
    export LC_ALL=C
    apt-get -y --no-install-recommends upgrade

    You will probably be prompted for timezone information.

  5. Let us install a range of dependencies and tools:
    apt-get install -y --no-install-recommends \
    build-essential \
    cmake \
    git \
    wget \
    nano \
    dialog \
    software-properties-common \
    libatlas-base-dev \
    libleveldb-dev \
    libsnappy-dev \
    libhdf5-serial-dev \
    libboost-all-dev \
    libgflags-dev \
    libgoogle-glog-dev \
    liblmdb-dev \
    pciutils \
    python3-setuptools \
    python3-dev \
    python3-pip \
    opencl-headers \
    ocl-icd-opencl-dev \
    libviennacl-dev \
    libavcodec-dev \
    libavformat-dev \
    libswscale-dev \
    libv4l-dev \
    libxvidcore-dev \
    libx264-dev \
    libgtk-3-dev \
    gfortran \
    pkg-config \
    libcanberra-gtk-module && \
    python3 -m pip install \
    numpy \
    opencv-python
    pip3 install protobuf==3.6.0
    add-apt-repository -y ppa:jonathonf/ffmpeg-4
    apt-get -y --no-install-recommends update
    apt-get -y install ffmpeg
  6. The NVCaffe image is strange in that it does not contain some pieces of software against which NVCaffe appears to be built. Protobuf and OpenCV are not installed, but the build fails without them. However, NVCaffe seems to expect them to be in a version that is NOT part of the OS’s package management. Ubuntu 18.04 currently contains Protobuf 3.0 and OpenCV 3.2, but the build fails if those are installed via apt, complaining about version mismatches. We thus need to install OpenCV 3.4 and Protobuf 3.6.0 specifically, although for the latter the pip3 line in the previous step should be enough.
  7. OpenCV with CUDA support and Fast Math (not that the latter seems to change much in our case…):
    cd /opt
    wget -O opencv3.4.8.zip https://github.com/opencv/opencv/archive/3.4.8.zip
    wget -O opencv-contrib3.4.8.zip https://github.com/opencv/opencv_contrib/archive/3.4.8.zip
    unzip opencv3.4.8.zip
    unzip opencv-contrib3.4.8.zip
    cd opencv-3.4.8/
    mkdir build && cd build
    cmake -D CMAKE_BUILD_TYPE=RELEASE \
        -D CMAKE_INSTALL_PREFIX=/usr/local \
        -D WITH_CUDA=ON \
        -D ENABLE_FAST_MATH=1 \
        -D CUDA_FAST_MATH=1 \
        -D WITH_CUBLAS=1 \
        -D WITH_FFMPEG=ON \
        -D INSTALL_PYTHON_EXAMPLES=ON \
        -D OPENCV_EXTRA_MODULES_PATH=/opt/opencv_contrib-3.4.8/modules \
        -D OPENCV_ENABLE_NONFREE=ON \
        -D BUILD_EXAMPLES=ON ..
    make -j`nproc`
    make install
    cd /opt
    rm opencv3.4.8.zip
    rm opencv-contrib3.4.8.zip
  8. Clone OpenPose and make a copy for the CPU version
    cd /opt
    git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git
    cp -R openpose openpose_cpu
  9. In our environment, I have root access only to a machine that does not have a GPU. But the compilation for the NVCaffe version appears to require the presence of a GPU. Thus I need to move the image for compilation to a machine in the HPC cluster that does not have direct Internet access. Skip this step if you have root access to a GPU machine (but make sure you called singularity with --nv and loaded CUDA before).
    chmod -R a+rwX /opt/openpose
    chmod -R a+rwX /opt/openpose_cpu
    chmod a+rwX /opt
    exit
    sudo tar cvzf openpose_multi_container_oct_2019.tar.gz openpose_multi_container_oct_2019/
    # copy to remote machine, SSH there and execute
    tar xvzf openpose_multi_container_oct_2019.tar.gz
    # You may need to load the correct NVidia drivers here. In our case: module load cuda/10.1
    singularity shell --nv -w openpose_multi_container_oct_2019
    export LC_ALL=C
    #optional: set http proxy
    export HTTP_PROXY=http://proxy.rrze.uni-erlangen.de:80
    export HTTPS_PROXY=https://proxy.rrze.uni-erlangen.de:443
    export http_proxy=http://proxy.rrze.uni-erlangen.de:80
    export https_proxy=https://proxy.rrze.uni-erlangen.de:443
  10. Modify /opt/openpose/CMakeLists.txt; the line removes a build error that occured for me.
    # Add this line (I put it in line 226):
    find_package(Boost COMPONENTS system filesystem REQUIRED)
  11. Build OpenPose for GPU, enabling all GPU architectures (but see step 17 for old cards):
    mkdir -p /opt/openpose/build && \
    cd /opt/openpose/build && \
    cmake -DDL_FRAMEWORK=NV_CAFFE -DCaffe_INCLUDE_DIRS=/usr/local/lib/include/caffe \
      -DCaffe_LIBS=/usr/local/lib/libcaffe-nv.so -DBUILD_CAFFE=OFF -DCUDA_ARCH=All .. && \
    make -j`nproc`
  12. There is a problem with the CPU version of Caffe used by OpenPose. We need to make some minor changes before it will compile in this container. If needed, you can download the COCO model that is faster with the CPU version.
    # Edit /opt/openpose_cpu/3rdparty/caffe/src/caffe/layers/mkldnn_inner_product_layer.cpp
    # Add 3 spaces to the beginning of lines 354 and 357
    # Add 4 spaces to the beginning of lines 355 and 358
    
    mkdir -p /opt/openpose_cpu/build && \
    cd /opt/openpose_cpu/build && \
    cmake -DGPU_MODE=CPU_ONLY .. && \
    make -j`nproc`
    # The following step is optional, in case you want to use the COCO model (faster on CPUs but less accurate)
    cd /opt/openpose/models
    wget -c http://posefs1.perception.cs.cmu.edu/OpenPose/models/pose/coco/pose_iter_440000.caffemodel -P pose/coco/
  13. Make this the content of the file /.singularity.d/runscript:
    #!/bin/bash
    if nvidia-smi; then
        cd /opt/openpose
        echo "#### USING GPU ####"
    else
        cd /opt/openpose_cpu
        echo "#### USING CPU ####"
    fi
    ./build/examples/openpose/openpose.bin "$@"
    
  14. Make sure the runscript is executable and exit:
    chmod a+rx /.singularity.d/runscript
    exit
  15. Convert the sandbox directory to a compressed read-only Singularity Image File (SIF) for production use:
    singularity build openpose_multi_container_oct_2019.sif openpose_multi_container_oct_2019/
    

    You can now delete the sandbox directory if you like.

  16. To run OpenPose, you can now simply run the singularity container with the OpenPose options. In our example below, we run on a GPU (hence --nv) and just wand JSON output for body pose, face and hand.
    singularity run --nv ~/openpose/openpose_multi_container_oct_2019.sif --video ~/tmp/whatever.mp4 --write_json ~/wherever --display 0 --render_pose 0 --face --hand
  17. If you have legacy cards, NVCaffe is not for you. Officially, only the Pascal, Volta and Turing architectures are supported. Details about card generations can be found here. However, in my tests, the GTX 980 from the Maxwell generation also worked with the NVCaffe build of OpenPose described above. The Kepler cards (K20m, K40m) did not. For those, we will install OpenPose with the Custom Caffe that comes with it by default. Unfortunately, we need a different version of cmake than the one that ships with Ubuntu 18.04 due to some incompatiblity.
    cd /opt
    wget https://github.com/Kitware/CMake/releases/download/v3.15.4/cmake-3.15.4-Linux-x86_64.sh
    /bin/sh cmake-3.15.4-Linux-x86_64.sh
    # [say yes to everything]
    git clone https://github.com/CMU-Perceptual-Computing-Lab/openpose.git openpose_legacy_gpu
    cd /opt/openpose_legacy_gpu
    rm -rf /opt/openpose_legacy_gpu/build
    mkdir -p /opt/openpose_legacy_gpu/build && \
    cd /opt/openpose_legacy_gpu/build && \
    /opt/cmake-3.15.4-Linux-x86_64/bin/cmake -DCUDA_ARCH=All .. && \
    make -j`nproc`
    

    Note: It looks like custom Caffe does not produce code for all architectures needed. OpenPose claims to be preparing code for sm_30 to sm_75 when -DCUDA_Arch=All is set, the “Caffe Configuration Summary” that shows during build does not include sm_75. We do not care at the moment, because anything from sm50 onwards should run with NVCaffe.
    We also have to adapt the runscript to select the appropriate version of OpenPose:

    #!/bin/bash
    if nvidia-smi; then
        if (($(deviceQuery | grep "CUDA Capability" | grep -oP "(?<= )[0-9]" | head -n 1) >= 5 )); then
            cd /opt/openpose
            echo "#### USING GPU with NVCaffe ####"
        else
            cd /opt/openpose_legacy_gpu
            echo "#### USING Legacy GPU with Custom Caffe ####"
        fi
    else
        cd /opt/openpose_cpu
        echo "#### USING CPU ####"
    fi
    ./build/examples/openpose/openpose.bin "$@"

Special thanks go to Thomas Zeiser, Georg Hager and all others at FAU’s HPC Support team for their patient expert help, and to KONWIHR for funding.

Please leave any feedback in the comments.

Leave a Reply

Your email address will not be published. Required fields are marked *