Notes on Programming in LLVM 6.0 and CUDA

Disclaimer: This post assumes that you have adequate knowledge on basic usage of the LLVM library!

LLVM is great. Yes. But LLVM is full of caveats and especially when it comes to documentation. Most of the docs you can find on the Internet is for LLVM 3.x (3.5, 3.9) and 4.x. LLVM is now 6.x and 7.x is on the way to release. Hopefully this post can help you solve some of your problems in learning LLVM and/or migration of LLVM versions.

LLVM IR and CUDA Related

LLVM IR is an interesting language: it is used by lots of projects as an intermediate language, however it is in no way stable. The IR specification changes every release of LLVM and have no guaranteed compatibility.

This problem becomes even worse with the proprietary NVVM compiler, the de facto compiler for CUDA. NVVM is still using the LLVM 3.4 version as its base (as of CUDA 9.2), as you can see here:

//
// Generated by NVIDIA NVVM Compiler
//
// Compiler Build ID: CL-23757830
// Cuda compilation tools, release 9.2, V9.2.64
// Based on LLVM 3.4svn
//

Thus it will not be able to read in the IR generated by LLVM with any version larger than 4.0. What made it worse is that there is actually no tool to convert between IR versions. This bites me when I was porting the CUDA functionality of Stanford's Terra language to LLVM 6.0. Every single function call succeeded without a warning, however the program does not work at all! And when I checked the output of NVVM, it is not empty. No. It is comment-only! Kind of like I was completely fooled by the non-empty string size.

To solve this incompatibility, we have to resort to the open-source counterpart of NVVM, the NVPTX backend of LLVM. It is not tuned by folks at NVIDIA, hence possibly lower performance. It is relatively easy to use, with one page of documentation that has plenty of information. Just you need to link the libdevice yourself, or you will get errors when doing anything with a math function.

LLVM API Related

The LLVM API, just like the IR, is also not stable between releases.

Linker API

Almost all the information you can find on the Internet is like this:

llvm::Linker linker("clang_test", "clang_test", lc, llvm::Linker::Verbose);

std::string error;
if( linker.LinkInModule( new_module, &error ) || !error.empty() ) {
    printf( "link error\n" );
    return -3;
}

llvm::Module* composite_module = linker.getModule();
if( composite_module == NULL ) {
    printf( "link error\n" );
    return -3;
}

Which is not correct for LLVM >3.4. In fact, LLVM now standardized all APIs to be Module-centric, i.e., everything is modular and these strings are gone. This means a little bit more boilerplate code but a lot more grasp for the user on understanding what is going on under the hood.

See it yourself:

auto libdevice = "/usr/local/lib/libdevice.bc"; // The library you want to link in
llvm::SmallString<2048> ErrMsg;
auto MB = llvm::MemoryBuffer::getFile(libdevice); // Get the contents of the lib file
auto E_LDEVICE = llvm::parseBitcodeFile(MB->get()->getMemBufferRef(), M->getContext()); // Detroit: Become Module

if (auto Err = E_LDEVICE.takeError()) { // New error handling
    llvm::logAllUnhandledErrors(std::move(Err), llvm::errs(), "[CUDA Error] ");
    return;
}

auto &LDEVICE = *E_LDEVICE;

auto TargetMachine = Target->createTargetMachine("nvptx64-nvidia-cuda", cpuopt, Features, opt, RM);

LDEVICE->setTargetTriple("nvptx64-nvidia-cuda");
LDEVICE->setDataLayout(TargetMachine->createDataLayout());

llvm::Linker Linker(*M); // M is the reference of the module you want to link libdevice in
Linker.linkInModule(std::move(LDEVICE));

As you can see above, now you need to first read in and parse the module to get a unique_ptr to the llvm::Module you want to link in, and create a llvm::Linker with the target module M, then link the library in by calling linker.linkInModule. Also note that how LLVM is now using smart pointers to handle memory: previously there are APIs like  linker.releaseModule, and now they are completely history.

To be continued :)