Cuda 8 cmake windows

Building Cross-Platform CUDA Applications with CMake

Cross-platform software development poses a number of challenges to your application’s build process. How do you target multiple platforms without maintaining multiple platform-specific build scripts, projects, or makefiles? What if you need to build CUDA code as part of the process? CMake is an open-source, cross-platform family of tools designed to build, test and package software across different platforms. Many developers use CMake to control their software compilation process using simple platform- and compiler-independent configuration files. CMake generates native makefiles and workspaces that can be used in the compiler environment of your choice. The suite of CMake tools were created by Kitware in response to the need for a powerful, cross-platform build environment for open-source projects such as ITK and VTK.

Figure 1. CMake adds CUDA C++ to its long list of supported programming languages.

In this post I want to show you how easy it is to build CUDA applications using the features of CMake 3.8+ (3.9 for MSVC support). Since 2009, CMake (starting with 2.8.0) has provided the ability to compile CUDA code through custom commands such as cuda_add_executable , and cuda_add_library provided by the FindCUDA package. CMake 3.8 makes CUDA C++ an intrinsically supported language. CUDA now joins the wide range of languages, platforms, compilers, and IDEs that CMake supports, as Figure 1 shows.

A CUDA Example in CMake

Let’s start with an example of building CUDA with CMake. Listing 1 shows the CMake file for a CUDA example called “particles”. I have provided the full code for this example on Github.

Before I work through all the logic and features showcased by Listing 1, let’s skip ahead to building. If you are using Visual Studio you need to use CMake 3.9 and the Visual Studio CUDA build extensions (included with the CUDA Toolkit), otherwise you can use CMake 3.8 or higher with the Makefile generator (or the Ninja generator) with nvcc (the NVIDIA CUDA Compiler) and a C++ compiler in your PATH. (Alternatively, you can set the CUDACXX and CXX environment variables to the path to nvcc and your C++ compiler, respectively).

To configure the CMake project and generate a makefile, I used the command

Figure 1 shows the output. CMake automatically found and verified the C++ and CUDA compilers and generated a makefile project. Note that the argument -DCMAKE_CUDA_FLAGS=»-arch=sm_30″ passes -arch=sm_30 to nvcc , telling it to target the Kepler architecture (SM_30 or Compute Capability 3.0) GPU in my computer.

Next, Figure 1 shows how I invoked the build with the command make -j4 . This runs make with multiple threads so it compiles the C++ and CUDA source files in parallel. For more information on how CMake determines where to find parallelism within a project read “CMake: Building with All Your Cores”. CMake also manages building and linking multiple languages into executables or shared libraries automatically.

Enabling CUDA

Let’s dig into the CMake code and work through the different components. As always, the first command in the root CMake file should be cmake_minimum_required , which asserts that the CMake version is new enough, and ensures that CMake can determine what backward compatibilities it needs to preserve when a user is running a newer CMake version than required.

Next, on line 2 is the project command which sets the project name ( cmake_and_cuda ) and defines the required languages (C++ and CUDA). This lets CMake identify and verify the compilers it needs, and cache the results. This results in generation of the common cache language flags that Figure 3 shows.

Figure 3. When CUDA is enabled, CMake provides default flags for each configuration
(Debug, Release, RelWithDebInfo, and MinSizeRel).

Now that CMake has determined what languages the project needs and has configured its internal infrastructure we can go ahead and write some real CMake code.

Building a Library with CMake

The first thing that everybody does when learning CMake is write a toy example like this one that generates a single executable. Let’s be a little more adventurous and also generate a static library that is used by an executable.

Usage requirements are at the core of modern CMake. Information such as include directories, compiler defines, and compiler options can be associated with targets so that this information propagates to consumers automatically through target_link_libraries . In previous versions of CMake, building CUDA code required commands such as cuda_add_library . Unfortunately, these commands are unable to participate in usage requirements, and therefore would fail to use propagated compiler flags or definitions. The intrinsic CUDA support now in CMake lets targets that use CUDA fully leverage modern CMake usage requirements and enables a unified CMake syntax for all languages.

C++ Language Level

One of the first things you’ll want to configure in a project is the C++ language level (98, 11, 14, 17…) you will use. CMake 3.1 introduced the ability to set the C++ language level for an entire project or on a per-target basis. You can also control the C++ language level for CUDA compilation.

You can easily require a specific version of the CUDA compiler through either CMAKE_CUDA_STANDARD or the target_compile_features command. To make target_compile_features easier to use with CUDA, CMake uses the same set of C++ feature keywords for CUDA C++. The following code shows how to request C++ 11 support for the particles target, which means that any CUDA file used by the particles target will be compiled with CUDA C++ 11 enabled ( —std=c++11 argument to nvcc ).

Enabling Position-Independent Code

When working on large projects it is common to generate one or more shared libraries. Each object file that is part of a shared library usually needs to be compiled with position-independent code enabled, which is done by setting the fPIC compiler flag. Unfortunately fPIC isn’t consistently supported across all compilers, so CMake abstracts away the issue by automatically enabling position-independent code when building shared libraries. In the case of static libraries that will be linked into shared libraries, position-independent code needs to be explicitly enabled by setting the POSITION_INDEPENDENT_CODE target property as follows.

CMake 3.8 supports the POSITION_INDEPENDENT_CODE property for CUDA compilation, and builds all host-side code as relocatable when requested. This is great news for projects that wish to use CUDA in cross-platform projects or inside shared libraries, or desire to support esoteric C++ compilers.

Separable Compilation

By default the CUDA compiler uses whole-program compilation. Effectively this means that all device functions and variables needed to be located inside a single file or compilation unit. Separate compilation and linking was introduced in CUDA 5.0 to allow components of a CUDA program to be compiled into separate objects. For this to work properly any library or executable that uses separable compilation has two linking phases. First it must do device linking for all the objects that contain CUDA device code, and then it must do the host side linking, including the results of the previous link phase.

Separable compilation not only allows projects to maintain a code structure where independent functions are kept in separate locations, it helps improve incremental build performance (a feature of all CMake based projects). Incremental builds allow recompilation and linking of only units that have been modified, which reduces build times. The primary drawback of sepable compilation is that certain function call optimizations are disabled for calls to functions that reside in a different compilation bit, since the compiler has no knowledge of the details of the function being called.

CMake now fundamentally understands the concepts of separate compilation and device linking. Implicitly, CMake defers device linking of CUDA code as long as possible, so if you are generating static libraries with relocatable CUDA code the device linking is deferred until the static library is linked to a shared library or an executable. This is a significant improvement because you can now compose your CUDA code into multiple static libraries, which was previously impossible with CMake. To control separable compilation in CMake, turn on the CUDA_SEPARABLE_COMPILATION property for the target as follows.

PTX Generation

If you want to package PTX files for load-time JIT compilation instead of compiling CUDA code into a collection of libraries or executables, you can enable the CUDA_PTX_COMPILATION property as in the following example. This example compiles some .cu files to PTX and then specifies the installation location.

To make PTX generation possible, CMake was extended so that all OBJECT libraries are capable of being installed, exported, imported, and referenced in generator expressions. This also enables PTX files to be converted or processed by tools such as bin2c and then embedded as C-strings into a library or executable. Here’s a basic example of this.

CMake and CUDA go together like Peanut Butter and Jam

I hope this post has shown you how naturally CMake supports building CUDA applications. If you are an existing CMake user, try out CMake 3.9 and take advantage of the improved CUDA support. If you are not an existing CMake user, try out CMake 3.9 and experience for your self how great it is for building cross-platform projects that use CUDA.

Источник

Установка OpenCV + CUDA на Windows

Введение

В данной статье речь пойдет о сборке и установке OpenCV 4 для C/C++, Python 2 и Python 3 из исходных файлов с дополнительными модулями CUDA 10 на ОС Windows.

Я постарался включить сюда все тонкости и нюансы, с которыми можно столкнуться в ходе установки, и про которые не написано в официальном мануале.

Сборка тестировалась для:

Windows 8.1 + Visual Studio 2017 + Python 2/3 + CUDA 10.0 + GeForce 840m
Windows 10 + Visual Studio 2019 + Python 2/3 + CUDA 10.0 + GeForce GTX 1060

Внимание! Сборка не подойдет для версии OpenCV 4.0.1 и/или CUDA ниже 10-й версии. CUDA 9 и ниже поддерживается OpenCV 3.

Что потребуется для установки

В моей сборке использовались следующие инструменты:

CMake 3.15
MS Visual Studio 2019 64-бит + средства CMake С++ для Windows
Python 3.7.3 64-бит + NumPy 64-бит
Python 2.7.16 64-бит + NumPy 64-бит
CUDA 10.0
CuDNN 7.6.2
OpenCV 4.1.1 и OpenCV-contrib-4.1.1

Установка

Так как установка производится через консольные команды, то следует внимательно и аккуратно выполнять все шаги. Также, при необходимости, меняйте установочные пути на свои.
Для начала необходимо установить требуемое ПО, причем Visual Studio должна быть установлена до CUDA:

CMake (версия >= 3.9.1)
MS Visual Studio
Python 3.7 (дистрибутив Anaconda3)
CUDA 10.0
CuDNN 7.6.2

После установки всех компонент убедитесь, что пути для CMake, Visual Studio, Python, CUDA, CuDNN прописаны в переменных PATH, PYTHONPATH, CUDA_PATH и cudnn соответственно.

Далее загружаем архивы исходников opencv-4.1.1 и opencv-contrib-4.1.1 в желаемое место (в моем случае это C:\OpenCV\).

Создаем папку build/ внутри opencv-4.1.1.

Далее сгенерируем файлы сборки с помощью cmake. Мы будем использовать консольный вариант cmake, так как cmake-gui путает типы некоторых переменных (например, OPENCV_PYTHON3_VERSION) и, как следствие, неправильно генерирует файлы.

Открываем консоль по пути C:\OpenCV\ и прописываем переменные.

Примечание. Для Visual Studio 2017 генератор пишется как «Visual Studio 15 2017 Win64» и без флага -A.

Также можно явно указать питоновские библиотеки для python 2 и python 3 на случай, если сборщик не сможет найти их автоматически.

Примечание. Обратите внимание на то, что библиотека NumPy должна быть той же разрядности, что и OpenCV. Проверить это легко:

Выполняем генерацию файлов сборки с помощью длинной команды ниже. При неудачной генерации или наличии ошибок после выполнения команды, повторную генерацию следует производить вычистив все файлы в build/ и .cache/.

BUILD_opencv_world – необязательный модуль, содержащий копии всех выбранных в ходе установки библиотек. Удобен при разработке на C++, так как вместо подключения кучи зависимостей opencv можно подключить одну зависимость opencv_world411.lib в проект
INSTALL_EXAMPLES/INSTALL_TESTS – установка примеров/тестов кода opencv
CUDA_FAST_MATH, WITH_CUBLAS – дополнительные модули для CUDA, призванные ускорить вычисления
CUDA_ARCH_PTX – версия PTX инструкций для улучшения производительности вычислений
OPENCV_EXTRA_MODULES_PATH – путь до дополнительных модулей из opencv-contrib (обязательно для CUDA)
BUILD_PROTOBUF – для работы некоторых модулей opencv необходим Protobuf (сборщик opencv в любом случае поставит BUILD_PROTOBUF=ON)

Спустя примерно 10 минут в консоле должна появиться информация о сборке и завершающие строки «Configuring done» и «Generating done». Проверяем всю информацию, особенно разделы NVIDIA CUDA, Python 2, Python 3.

Далее собираем решение. На сборку может уйти несколько часов в зависимости от вашего процессора и версии Visual Studio.

После успешной установки создаем системную переменную OPENCV_DIR со значением C:\OpenCV\opencv-4.1.1\build\install\x64\vc15\bin и также добавляем ее в PATH.

Проверим работоспособность OpenCV с модулем CUDA на простом примере умножения матриц.

Установить тип сборки Release/x64 (для Debug следует собрать OpenCV с флагом Debug)
Project Properties → C/C++ → General → Добавить строку «C:\OpenCV\opencv-4.1.1\build\install\include» в Additional Include Directories
Project Properties → Linker → General → Добавить строку « C:\OpenCV\opencv-4.1.1\build\install\x64\vc16\lib» в Additional Library Directories
Project Properties → Linker → General → Добавить «;opencv_world411.lib» («;opencv_world411d.lib» для Debug) в конец Additional Dependencies

Пример на Python 3

Удаление

Чтобы удалить OpenCV, нужно выполнить команду.

и удалить системную переменную OPENCV_DIR и убрать путь до OpenCV из PATH.

Заключение

В статье мы рассмотрели установку OpenCV 4 для ОС Windows 10. Данный алгоритм тестировался на Windows 8.1 и Windows 10, но, в теории, может собираться и на Windows 7. За дополнительной информацией можно обратиться к списку источников ниже.