Creating Low-Level Bindings¶
Before you can write mid-level bindings to your library, you first need to create low-level bindings via cffi
. Normally this would involve preprocessing your headers via gcc -e
, followed by additional manual cleaning. Then you’d have to distribute this header with your software, with possible copyright implications.
Ideally, you wouldn’t have to distribute a header at all since the user has the library and its headers installed already. NiceLib’s goal is to make this possible.
If all goes well, you need to do only one thing: write a build module. For a library named foo
, name your module file _build_foo.py
, and put it in the same directory that your wrapper will be in. This build file contains info about where to find the shared lib and its headers on different platforms. The build module for a Windows-only lib might look like this:
# _build_foo.py
from nicelib import build_lib
header_info = {
'win*': {
'path': (
r"{PROGRAMFILES}\Vendor\Product",
r"{PROGRAMFILES(X86)}\Vendor\Product",
),
'header': 'foo.h'
},
}
lib_names = {'win*': 'foo'}
def build():
build_lib(header_info, lib_names, '_foolib', __file__)
You then call load_lib('foo', __package__)
in your wrapper file to load the LibInfo` object. This uses the ``_foolib
submodule if it exists. If it doesn’t exist yet, load_lib()
tries to build it by calling the build()
function in _build_foo
. This calls build_lib()
, which searches for foo.dll
in the system path and looks for foo.h
in both of the vendor-specific directories given above. If it finds them successfully, it then processes the header so that cffi
can understand it.
The two main challenges in writing a build file are locating the headers/libs and ensuring that the headers are processed succesfully.
Locating Headers and Libraries¶
Both the header_info
and lib_name
arguments to build_lib
can be a dict that maps from a platform to the corresponding path or name, allowing cross-platform support. The platform specifiers ('win*'
in the example above) are checked against sys.platform
to find which platform-specific paths and filenames to try, using pattern globbing if given. You may also discriminate between 32- and 64-bit systems by appending a colon and then the bitness, e.g. 'win*:32'
. If you want the platforms to be checked in a specific order, for example if you want to specify a default fallthrough option, you can use an ordereddict
instead of a dict
.
For lib_name
, each platform-specific value is a string or tuple of strings. Each string is the name of a library, without any prefix like lib
or or suffix like .so
or .dll
. This is the form used by ctypes.util.find_library
. If you use a tuple of names, NiceLib will look for each library in turn, using the first one it finds. This is useful if the library could have any of a few different names.
For header_info
, the each platform-specific value is a dict with the following keys:
- ‘header’
A string or tuple of strings which are the names/paths of all the headers to include.
- ‘path’
Optional. A tuple of base directories where the headers may be found. Each header is searched for in each directory until it is found. The directories are specified as strings.
Path strings can contain environment variables in the form '{VAR_NAME}'
as shown in the example above. If a variable is not contained in os.environ
, the whole string is left unsubstituted.
Paths can be relative or absolute. Relative paths are relative to the directory given via the filedir
parameter.
Processing Headers¶
Processing headers can be one of the trickier aspects of using NiceLib, especially if you’re new to it. But don’t be discouraged, there are tools designed to help you out.
If build_lib()
doesn’t succeed on your first try, you’ll have to do a bit of sleuthing. First of all, if it looks like NiceLib is processing (or failing to find) a bunch of headers that you don’t need, you can tell build_lib()
to ignore them. If a header you’re processing includes <windows.h>
, for instance, header processing will run for quite a long time and likely fail. Usually you don’t actually need <windows.h>
, however. Check out the ignored_headers
and ignore_system_headers
parameters of build_lib()
for two ways to ignore headers.
When there’s an error while parsing part of a header, NiceLib should spit out the “chunk” of code that caused a parse error. Usually this is due to some compiler-specific syntax that cffi
does not understand and you simply want to remove, like a __declspec
.
To remove the problematic syntax, you should use NiceLib’s parser hooks, which are used to transform a stream of C tokens (token hooks) or a C abstract-syntax-tree (AST hooks). These hooks get passed into build_lib()
via its token_hooks
and ast_hooks
arguments. There are a few hooks which are enabled by default since they’re required so often.
NiceLib already has built-in hooks for many common cases, so be sure to check out Token Hooks and AST Hooks before writing your own. Take a look at what hooks are available, it will give you a sense of what types of fixes are usually required.
If you do have an unusual case and need to write your own hook, be sure to check out the Token Hook Helpers and AST Hook Helpers, which can be used to simplify the process.
Behind the Scenes¶
build_lib()
does a few things when it’s executed. First, it looks for the header(s) in the locations you’ve specified and invokes process_headers()
, which preprocesses the headers and returns two strings: the cleaned header C code and the extracted macros, converted to Python code. It uses the cleaned header to generate an out-of-line cffi
module, then appends code for loading the shared lib and implementing the headers’ macros. This finished module can be imported like any other, but is usually loaded via load_lib()
.
How Headers are Processed¶
The bulk of the heavy lifting is done (and most issues are most likely to occur) in process_headers()
. First, the header code is tokenized and parsed by a lexer and parser defined in the process
module. This parser doesn’t understand C, but does understand the language of the C preprocessor. It keeps track of macro definitions, removing them from the token stream and performing expansion of macros when they are used. It also understands and obeys other directives, including conditionals and #include
s. After parsing, the token stream should be free of any harmful directives that pycparser
/cffi
don’t understand.
This token stream can then be acted upon by the so-called “token hooks”, which can be supplied via the arguments lists of process_headers()
and build_lib()
. These hooks are functions which both accept and return a sequence of tokens. The purpose of each hook is to perform a specific transformation on the token stream, usually removing nonstandard syntax that pycparser
/cffi
may not understand (e.g. C++ specific syntax).
Once the hooks are all applied, the tokens are joined together into chunks that are parseable by pycparser
's C parser. After each chunk is parsed, it is acted upon by the “AST hooks”, which take the parsed abstract syntax tree (AST) and a reference to the parser and return a transformed AST. This allows hooks to modify the AST and the state of the parser. Once all of the chunks have been parsed and joined together into one big AST, this tree is used to generate the C source code which is later returned by process_headers()
.