Porting your C++ code to the web with Emscripten

Nenad Mikša, Head of TSI @ Microblink

@dodo at cpplang.slack.com
meetup@microblink.com

Emscripten

From emscripten.org:

a toolchain for compiling to asm.js and WebAssembly, built using LLVM, that lets you run C and C++ on the web at near-native speed without plugins.

Emscripten

  • compiler based on LLVM (clang)
  • SDK providing common libraries
    • file system IO, OpenGL, SDL, threading, networking, ...

Emscripten compiler

  • frontend
    • a python script
  • backend
    • a c++ compiler
    • two available backends:
      • fastcomp
      • upstream

Fastcomp backend

  • compilation steps:
    • C++ ➞ Bitcode
  • linking steps:
    • Bitcode ➞ Javascript ➞ WebAssembly

Fastcomp build steps

  • C++ ➞ Bitcode
    • based on clang 6 (partial c++17 support)
  • Bitcode ➞ Javascript
    • custom LLVM backend (fastcomp)
    • not available in mainline LLVM
  • Javascript ➞ WebAssembly

Fastcomp backend - advantages and disadvantages

  • slow linking phase
    • effectively LTO always enabled
  • suboptimal performance and code size
    • due to Javascript in compilation step
  • stable
  • default backend until very recently (December 2019)

Upstream backend

  • compilation steps:
    • C++ ➞ WebAssembly
  • linking steps:
    • concatenation of WebAssembly object files
    • dead code removal
    • optionally wasm2js

Upstream build steps

  • C++ ➞ WebAssembly
    • based on upstream LLVM project (currently clang 10)
    • all latest C++ features
  • WebAssembly ➞ Javascript
    • optional - only to support old browsers

Upstream backend - advantages and disadvantages

  • fast linking
  • supports latest c++ features
  • direct C++ ➞ WebAssembly compilation
  • part of mainline LLVM
  • default backend as of emscripten v1.39.0
  • still too unstable for production use
    • common compiler crashes (ICEs)
  • LTO completely broken

WebAssembly

From webassembly.org:

a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

Differences to Javascript

WebAssembly Javascript
binary textual
low-level (difficult to decompile) high-level (difficult to obfuscate)
AOT compilation JIT
custom memory garbage-collected heap

Current limitations of WebAssembly

  • no support for threads
  • no support for SIMD
  • no access to system resources (files, network sockets, ...)
  • behaves as an embedded system
  • needs initialization from Javascript

Future of WebAssembly

Emscripten SDK

Emscripten SDK (continued)

POSIX threads support

  • simply compile and link your code with -s USE_PTHREADS=1 and use the usual pthread_* set of functions or std::thread
  • features require support for WebWorkers and SharedArrayBuffer
  • if building without thread support, pthread_* functions are still available, but calling them will crash your code

File system support

  • implemented with standard fopen and friends
  • works with std::fstream and other abstractions over fopen and friends
  • all required files need to be packaged at link time using --preload-file linker option
    • --preload-file real/path@/virtual/path
    • all packaged files are packed into a single .data file
    • automatically generated JS code automatically downloads the .data file and initializes the virtual file system during runtime initialization

File writing support

  • multiple backends for standard functions:
    • MEMFS
      • in-memory, all writes lost after page reload
    • NODEFS
      • wraps NodeJS filesystem functions
    • IDBFS
      • uses browser's IndexedDB for storing data
    • WORKERFS
      • read-only access to File and Blob objects inside a web worker

File system backends

  • by default, MEMFS is used (and is always mounted at /)
  • Javascript code can initialize other backends for other mount points
    • this is currently not possible from C++ 😞
  • Emscripten also has its own Asynchronous File System API
    • API to fetch files from the web asynchronously
    • a separate API, not a backend for fopen and friends

EGL and OpenGL ES

  • imlemented with JS glue code forwarding GL calls to WebGL
  • EGL used for creating WebGL context
    • identical procedure as in Android NDK
  • WebGL is subset of OpenGL ES 2.0/3.0
    • if only compatible features are used, mapping is 1:1
    • Emscripten also supports emulating OpenGL ES features not available in WebGL
      • however, this comes with some performance cost
  • both WebGL 1 and WebGL 2 are supported

Mixing C++ and Javascript code

#include <emscripten.h>

EM_JS(void, call_alert, (), {
  alert('hello world!');
  throw 'all done';
});

int main() {
  call_alert();
  return 0;
}

Inline mixing of code

#include <emscripten.h>

int main() {
  EM_ASM(
    alert('hello world!');
    throw 'all done';
  );
  return 0;
}

Passing parameters from C++ to JS

template< typename Outcome >
auto succeedOrDie( Outcome && outcome ) {
    if ( outcome.has_value() ) {
        return std::forward< Outcome >( outcome ).value();
    } else {
        EM_ASM_( {
            throw new Error( UTF8ToString( $0 ) );
        }, outcome.error().data() );
    }
}

Exporting C++ functions to Javascript

#include <math.h>
#include <emscripten.h>

EMSCRIPTEN_KEEPALIVE
int int_sqrt(int x) {
  return sqrt(x);
}

To call from Javascript, use cwrap or ccall:

var int_sqrt = Module.cwrap('int_sqrt', 'number', ['number']);
var result = int_sqrt(12);

Exporting C++ functions to Javascript (2)

Calling with ccall:

// Call C from JavaScript
var result = Module.ccall('int_sqrt', // name of C function
  'number', // return type
  ['number'], // argument types
  [28]); // arguments

Direct call (works only for primitive types):

var result = Module._int_sqrt(28);

Exporting C++ classes and value objects

  • very difficult and pointless to do that manually
  • two possible solutions:

Embind

  • requires C++14
  • entire JS interface defined in C++ code
  • bindings are specified in EMSCRIPTEN_BINDINGS blocks

Embind class binding example

class EmscriptenFaceDetector
{
public:
    EmscriptenFaceDetector();
    EmscriptenFaceDetection detectFaces( emscripten::val const & jsImageData );

private:
    FaceDetector detector_;
    cv::Mat rgbaImage_;
};

Embind bindings code

using namespace emscripten;
EMSCRIPTEN_BINDINGS( FaceDetector )
{
    class_< EmscriptenFaceDetector >( "FaceDetector" )
        .constructor()
        .function( "detectFaces", &EmscriptenFaceDetector::detectFaces );
}

JS class usage

var faceDetector = new Module.FaceDetector();
var detectionResult = faceDetector.detectFaces( image );

// free the memory
faceDetector.delete();
  • NOTE
    • JS does not support finalizers
    • JS garbage collector will cleanup only the JS proxy object
    • delete must be called to prevent leaks on the emscripten heap

Embind value object example

struct EmscriptenFaceDetection
{
    bool detected = false;
    int  x        = 0;
    int  y        = 0;
    int  width    = 0;
    int  height   = 0;
};

Embind bindings code

using namespace emscripten;
EMSCRIPTEN_BINDINGS( FaceDetector )
{
    value_object< EmscriptenFaceDetection >( "FaceDetection" )
        .field( "detected", &EmscriptenFaceDetection::detected )
        .field( "x"       , &EmscriptenFaceDetection::x        )
        .field( "y"       , &EmscriptenFaceDetection::y        )
        .field( "width"   , &EmscriptenFaceDetection::width    )
        .field( "height"  , &EmscriptenFaceDetection::height   );
}

JS object usage

  • FaceDetection is normal JS object, on JS garbage-collected heap
  • developer does not need to manually call delete
  • however, each transition of the JS ⬌ WebAssembly boundary creates a new copy

Handling JS objects in C++

  • emscripten::val type
    • a proxy type to any JS object
    • provides utilities for accessing global JS properties
    • can be used to transliterate JS code into C++
  • for example, obtaining width and height of the JS ImageData object:
void obtainImage( emscripten::val const & jsImageData )
{
    auto width = jsImageData[ "width" ].as< unsigned long >();
    auto height = jsImageData[ "height" ].as< unsigned long >();
}

Generating modularized Javascript API

  • by default, all C++ functions and objects are exported as members of global Module object
    • this may cause conflicts with other Emscripten-generated JS libraries
  • by linking with -s MODULARIZE=1 -s EXPORT_NAME=ModuleName Emscripten will generate a JS function named ModuleName which is used for initialization of WebAssembly

Step-by-step example

Thank you