2017-08-10

第五章 Cython和扩展类型

前面几章介绍了Cython如何和Python一起工作，侧重于基础数据类型和函数，Cython也能在Python类上面做一样好，本章我们将继续学习。

比较Python类和扩展类型

Python中一切皆为对象，对象有三个基本元素：id、value和type。对象的类型指定了该对象的行为，他们的行为通过特殊的方法控制。Python允许我们创建新的类型，也就是类，通过class关键字定义，本章我们将看到Cython如何允许底层的C访问对象的数据和方法，和他带来的好处。
我们可以使用Python/C API来用C语言直接创建扩展类型，效率会显著提升，但是要熟悉Python/C API，编写难度大，这个时候就有了Cython的用武之地了：Cython创建和使用扩展类型和纯Python的一样简单，Cython使用cdef class代码块，和纯Python类有很多相同的地方。

Cython中的扩展类型

举个例子，有下面一个类：

class Particle(object):
    """Simple Particle type."""
    def __init__(self, m, p, v):
        self.mass = m
        self.position = p
        self.velocity = v
    def get_momentum(self):
        return self.mass * self.velocity

上面是一个纯Python类，我们可以使用Cython编译这个类为C语言，生成的代码使用了Python/C API，和纯Python编写基本上没区别，因为绕过了解释器，所以性能可能得到一点点提升，但是没有从静态类型中得到任何好处。
将上面的类转化为Cython的扩展类型如下：

cdef class Particle:
    """Simple Particle extension type."""
    cdef double mass, position, velocity
    def __init__(self, m, p, v):
        self.mass = m
        self.position = p
        self.velocity = v
    def get_momentum(self):
        return self.mass * self.velocity

cdef class声明告诉Cython生成一个扩展类型而不是一个Python类，C级别的实例属性和C++或者Java类似，所以所有的属性必须定义，否则初始化调用init()函数会抛出属性未定义异常。
编译使用我们的例子：

In [1]: import pyximport; pyximport.install()
Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c64290>)
In [2]: import cython_particle
In [3]: import python_particle
In [6]: py_particle = python_particle.Particle(1.0, 2.0, 3.0)
In [7]: cy_particle = cython_particle.Particle(1.0, 2.0, 3.0)
In [8]: py_particle.get_momentum()
Out[8]: 3.0
In [9]: cy_particle.get_momentum()
Out[9]: 3.0
In [10]: py_particle.mass, py_particle.position, py_particle.velocity
Out[10]: (1.0, 2.0, 3.0)
In [11]: cy_particle.mass, cy_particle.position, cy_particle.velocity
Traceback (most recent call last)
[...]
AttributeError: 'cython_particle.Particle' object has no attribute 'mass'
In [13]: py_particle.charge = 12.0
In [14]: cy_particle.charge = 12.0
Traceback (most recent call last)
[...]
AttributeError: 'cython_particle.Particle' object has no attribute 'charge'

从上述例子中我们发现，为什么扩展类型中的实例属性不能从Python中访问？为什么我们不能给扩展类型添加新的属性？因为Cython定义的扩展属性编译的时候实际上是一个C语言结构体，编译的时候已经固定了大小，不能添加和更改新的属性。

类型属性和访问控制

纯Python类访问属性是通过在dict字典中查找，可以任意访问、添加、修改属性，但是性能不高，Cython通过cdef class定义的扩展类型，直接将代码编译为C语言的结构体，性能能得到显著提升，默认情况下扩展类型属性是私有的，但是如何访问Cython的扩展类型的属性了？可以明确的通过Cython设置属性的只读、可写和可读。
下面是可读实例：

#mass, position, velocity三个属性均只读
cdef class Particle:
    """Simple Particle extension type."""
    cdef readonly double mass, position, velocity
    #...
#mass可读，position和velocity对Python不可读
cdef class Particle:
    """Simple Particle extension type."""
    cdef readonly double mass
    cdef double position, velocity
    # ...
#mass属性可读写，position属性只读，velocity属性不可读
cdef class Particle:
    """Simple Particle extension type."""
    cdef public double mass
    cdef readonly double position
    cdef double velocity
    # ...

C层面的初始化和终止

和Python类的初始化不一样，C层面的扩展类型是本质上是结构体，在调用init之前，实例的结构必须分配，所有的字段必须是有效状态，等待初始值。Cython添加了一个特殊的方法cinit，其职责是C级别的分配和初始化。在上面的例子中，Particle是可以使用init进行初始化的，因为属性都是double类型，但是这取决于扩展类型被继承或者有其他的构造函数，init在创建对象中会被调用多次，但是有方法绕过init函数，Cython可以确保cinit只被调用一次，而且在init、new或者其他函数之前被调用，Cython通过cinit初始化任何参数。如下面例子：

from libc.stdlib cimport *
cdef class Matrix:
    cdef:
        unsigned int nrows, ncols
        double *_matrix
    def __cinit__(self, nr, nc):
        self.nrows = nr
        self.ncols = nc
        self._matrix = <double*>malloc(nr * nc * sizeof(double))
        if self._matrix == NULL:
            raise MemoryError()
    def __dealloc__(self):
        if self._matrix != NULL:
            free(self._matrix)

上面的例子中如果self._matrix放在init中初始化，那个init永远不会被调用，使用self._matrix就会失败，如果init调用多次，会造成内存泄露。
Cython通过dealloc特殊函数进行清理，释放资源，Cython确保在终止时只调用一次dealloc函数来释放cinit创建时申请的资源。

cdef和cpdef方法

在cdef class中我们也可以使用cdef和cpdef方法，但是在普通的Python类中使用会报错。
cdef方法和cdef函数类似：所有的参数都是传进去的，所以没有Python到C的类型映射，这也意味着cdef方法只能被Cython代码调用，不能被Python代码调用。
cpdef方法和cpdef函数类似：cpdef可以被扩展的Python代码和其他的Cython代码调用，当然，参数值和返回值必须自动的转换为Python对象，所以限制了允许的数据类型，如指针就不行。
如下面例子：

cdef class Particle:
    """Simple Particle extension type."""
    cdef double mass, position, velocity
    # ...
    cpdef double get_momentum(self):
        return self.mass * self.velocity
#我们可以在Python shell或者Python代码中或者Cython代码中运行下面例子
def add_momentums(particles):
    """Returns the sum of the particle momentums."""
    total_mom = 0.0
    for particle in particles:
        total_mom += particle.get_momentum()
    return total_mom

上面的例子可以看做是Python代码对扩展类的包装，Particle底层的结构和Python对象之间的封装和解包都是自动完成的。我们也可以添加类型信息，这样Cython代码将会生成更快的代码，如下：

def add_momentums_typed(list particles):
    """Returns the sum of the particle momentums."""
    cdef:
        double total_mom = 0.0
        Particle particle
    for particle in particles:
        total_mom += particle.get_momentum()
    return total_mom

上面例子中，如果我们调用particle时不声明类型，性能甚至比纯Python代码还要差。
还有一个例子需要比较，如果我们将get_momentum()换成cdef函数其他的不变会怎样？

cdef class Particle:
    """Simple Particle extension type."""
    cdef double mass, position, velocity
    # ...
    cpdef double get_momentum(self):
        return self.mass * self.velocity
    cdef double get_momentum_c(self):
        return self.mass * self.velocity

这个版本性能是最好的，但是get_momentum_c()方法不能被Python调用，只能被Cython调用。

继承和子类

一个扩展类型可以继承一个基类，但是这个基类必须是用C实现或者内置类型或者其他的扩展类型，常规的Python类或者扩展类型尝试多继承都会报错。
例子如下：

cdef class CParticle(Particle):
    cdef double momentum
    def __init__(self, m, p, v):
        super(CParticle, self).__init__(m, p, v)
        self.momentum = self.mass * self.velocity
    cpdef double get_momentum(self):
        return self.momentum

当然也可以在纯Python中继承Particle类，如下：

class PyParticle(Particle):
    def __init__(self, m, p, v):
        super(PyParticle, self).__init__(m, p, v)
    def get_momentum(self):
        return super(PyParticle, self).get_momentum()

但是PyParticle不能访问Particle中私有的C级别的属性和cdef方法，只能重写def和cpdef方法。不过这样做很慢，穿越Cython/Python边界需要一定的开销。

铸造和子类

当时用动态类型时，Cython不能访问C级别的数据和方法，所有的属性访问都必须通过Python/C API，这样很慢，但是Cython可以通过定义静态类型或者使用cpdef方法来访问属性而不用通过Python/C API。我们还可以将动态类型赋值给静态类型，如下：

cdef Particle static_p = p
print(static_p.get_momentum())
print(static_p.velocity)
#或者这样使用
print( (<Particle>p).get_momentum())
print( (<Particle>p).velocity)
#下面的方法更加安全
print( (<Particle?>p).get_momentum())
print( (<Particle?>p).velocity)

如果p不是Particle实例或者他的子类会抛出TypeError异常。

扩展类型对象和None

有下面一个简单的函数：

def dispatch(Particle p):
    print p.get_momentum()
    print p.velocity
#有下面几种使用方式
dispatch(Particle(1, 2, 3)) # OK
dispatch(CParticle(1, 2, 3)) # OK
dispatch(PyParticle(1, 2, 3)) # OK
dispatch(object()) # TypeError
dispatch(None) # Segmentation fault!

Python的None相当于C语言的null，但是本质上None没有C接口，所以尝试用None来访问属性或者方法是无效的，所以dispatch应该先检查p对象是否是None，如下：

def dispatch(Particle p):
    if p is None:
        raise TypeError("...")
    print p.get_momentum()
    print p.velocity
#这是一个常见的操作，Cython提供了特殊的语法
def dispatch(Particle p not None):
    print p.get_momentum()
    print p.velocity

Cython提供了nonecheck编译指令，默认情况下，为了性能，所有函数的的调用都是不安全的，可以通过nonecheck编译指令打开None检查。

#代码开头添加下面指令
# cython: nonecheck=True
#或者编译的时候指定
$ cython --directive nonecheck=True source.pyx

Cython中的扩展类型属性

Python的属性访问灵活且功能强大，我们可以在Python类中设置setter和getter方法来方便访问属性，如下：

class Particle(object):
    # ...
    def _get_momentum(self):
        return self.mass * self.velocity
    momentum = property(_get_momentum)

Cython针对扩展类型用不同的语法达到了相同的效果：

cdef class Particle:
    """Simple Particle extension type."""
    cdef double mass, position, velocity
    # ...
    property momentum:
        """The momentum Particle property."""
        def __get__(self):
            """momentum's getter"""
            return self.mass * self.velocity
        def __set__(self, m):
            """momentum's setter"""
            self.velocity = m / self.mass

我们可以像访问纯Python一样访问Cython属性，如下：

In [3]: p = cython_particle.Particle(1, 2, 3)
In [4]: p.momentum
Out[4]: 3.0
In [5]: p.momentum = 4.0
In [6]: p.momentum
Out[6]: 4.0

我们可以在Cython中定义相应的get、set和del特殊方法来访问属性，如果哪一个方法没实现就不能访问相应的操作。

特殊方法甚至是更加特别

当Cython扩展类提供支持操作符重载是，我们必须定义一些特殊方法。前面我们已经讲到了cinit、init、dealloc特殊方法，看到了如何处理C级别的初始化，Python级别的初始化和终止。扩展类型不支持del特殊方法，dealloc替代了他的功能。

算数方法

在纯Python中，为了完全重载+操作符号，需要重写add和radd特殊方法，但是Cython只需要重载add就够了，他实现了add和radd的功能。下面有一个简单的例子：

cdef class E:
    """Extension type that supports addition."""
    cdef int data
    def __init__(self, d):
        self.data = d
    def __add__(x, y):
        # Regular __add__ behavior
        if isinstance(x, E):
            if isinstance(y, int):
                return (<E>x).data + y
        # __radd__ behavior
        elif isinstance(y, E):
            if isinstance(x, int):
                return (<E>y).data + x
        else:
             return NotImplemented

Cython不会自动的转换传入add中的参数类型，需要检查实例，确保能成功访问每一个E实例对象的data属性。

丰富的比较操作符

Cython扩展类型不支持比较操作的特殊方法，如eq、lt和le等特殊方法。但是Cython提供了一个单独的特殊方法richcmp(x, y, op)，通过第三个参数来执行要比较的操作。对应关系如下：

表中整数参数是编译时定义在object.h中的常量，我们可以导入这些常量。如下面例子：

from cpython.object cimport Py_LT, Py_LE, Py_EQ, Py_GE, Py_GT, Py_NE
cdef class R:
    """Extension type that supports rich comparisons."""
    cdef double data
        def __init__(self, d):
            self.data = d
        def __richcmp__(x, y, int op):
            cdef:
                R r
                double data
            # Make r always refer to the R instance.
            r, y = (x, y) if isinstance(x, R) else (y, x)
            data = r.data
            if op == Py_LT:
                return data < y
            elif op == Py_LE:
                return data <= y
            elif op == Py_EQ:
                return data == y
            elif op == Py_NE:
                return data != y
            elif op == Py_GT:
                return data > y
            elif op == Py_GE:
                return data >= y
            else:
                assert False

效果如下：

In [1]: import pyximport; pyximport.install()
Out[1]: (None, <pyximport.pyximport.PyxImporter at 0x101c7d290>)
In [2]: from special_methods import R
In [3]: r = R(10)
In [4]: r < 20 and 20 > r
Out[4]: True
In [5]: r > 20 and 20 < r
Out[5]: False
In [6]: 0 <= r <= 100
Out[6]: True
In [7]: r == 10
Out[7]: True
In [8]: r != 10
Out[8]: False
In [9]: r == 20
Out[9]: False
In [10]: 20 == r
Out[10]: False

迭代器支持

为了让扩展类型可迭代，我们在里面定义了iter，就像在纯Python中使用一样，为了让扩展类型成为迭代器，我们在里面定义了next特殊方法。例子如下：

cdef class I:
    cdef:
        list data
        int i
    def __init__(self):
        self.data = range(100)
        self.i = 0
    def __iter__(self):
        return self
    def __next__(self):
        if self.i >= len(self.data):
            raise StopIteration()
        ret = self.data[self.i]
        self.i += 1
        return ret

上面的特殊方法只是列举了一部分，更多的特殊方法吗，请参看Cython的官方文档：http://docs.cython.org/en/latest/

本文标题:第五章 Cython和扩展类型

文章作者:HatBoy

发布时间:2017-08-10, 17:08:07

最后更新:2018-08-19, 14:47:02

原始链接:https://hatboy.github.io/2017/08/10/第五章-Cython和扩展类型/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。