This blogpost describes some details of using CFFI arrays versus static vectors in implementing a wrapper around a foreign library. In our case, it is a LZMA (de)compressor, with original C sources available here and my wrapper library available here. CL-LZMA supports both basic compression and decompression. For brevity, let us discuss decompression only.
The old version of my code used CFFI to convert between Lisp data and C data, by means of
LISP-TO-FOREIGN-ARRAY. These functions allocate a new array each time and copy the contents over, which is a slow process.
This code was also doubly wasteful: not only
FOREIGN-ARRAY-TO-LISP copies once by creating a new Lisp array containing the data, but
COERCE does a second copy, this time copying the data into an array of proper element type. Back when this code was written, CFFi's
FOREIGN-ARRAY-TO-LISP had no means of specifying element type to the created array; I have fixed it with some help from Luís Oliveira.
The downside of a static vector is that it is not garbage collected, so we have to manually free it when we are done with it; the upside is that we are able to get a C pointer of such a vector that refers to the same memory area that is used by Lisp. This means that a static vector allows us to store a single vector of data that can be freely operated on by C functions (since we have a immobile memory pointer) and by Lisp functions (since we have a reference to the
In my case, the LZMA (de)compressor accepts pointers to raw memory. The ability to use static vectors requires your library to be designed around externally supplied buffers; in other words, static vectors are useless if your C code allocates its buffers on its own. In general, it is impossible to create a Lisp vector from a pointer to raw memory without copying the whole vector, so the only way of getting a static vector to work with C code is allocating it inside Lisp and then passing its pointer to whatever C function you are using.
As for a small speed-test, let us test compressing and decompressing one megabyte of random data on my machine.
(defun random-vector (length) (let ((vector (make-array length :element-type '(unsigned-byte 8))) (*random-state* (make-random-state t))) (loop for i below (length vector) do (setf (aref vector i) (random 256))) vector)) (defvar *random-data* (random-vector (expt 10 6)))
CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress (cl-lzma:lzma-compress *random-data*))) Evaluation took: 0.323 seconds of real time 0.323035 seconds of total run time (0.318949 user, 0.004086 system) 100.00% CPU 682,239,302 processor cycles 0 bytes consed
CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress (cl-lzma:lzma-compress *random-data*))) Evaluation took: 1.324 seconds of real time 1.330467 seconds of total run time (1.294264 user, 0.036203 system) [ Run times consist of 0.083 seconds GC time, and 1.248 seconds non-GC time. ] 100.45% CPU 2,796,481,840 processor cycles 1,119,702,784 bytes consed
Not only the new code is four times faster (a third of a second versus four thirds of a second), the new code does not cons a single byte.
The SBCL profiler tells me that consing is really high in the old code:
CL-USER> (sb-profile:profile cffi:foreign-array-to-lisp cffi:lisp-array-to-foreign cl-lzma:lzma-compress cl-lzma:lzma-decompress) CL-USER> (sb-profile:report) measuring PROFILE overhead..done seconds | gc | consed | calls | sec/call | name ----------------------------------------------------------- 0.638 | 0.070 | 1,020,998,800 | 3 | 0.212666 | CFFI:FOREIGN-ARRAY-TO-LISP 0.365 | 0.006 | 96,665,568 | 3 | 0.121666 | CFFI:LISP-ARRAY-TO-FOREIGN 0.207 | 0.000 | 1,013,680 | 1 | 0.206998 | CL-LZMA:LZMA-COMPRESS 0.063 | 0.000 | 1,000,016 | 1 | 0.062998 | CL-LZMA:LZMA-DECOMPRESS ----------------------------------------------------------- 1.273 | 0.076 | 1,119,678,064 | 8 | | Total
This becomes even more evident when we up the ante to one hundred megabytes.
(setf *random-data* (random-vector (expt 10 8)))
(Remember to avoid printing the resulting structure. Printing hundreds of megabytes to your SLIME session is a good way to cause yourself problems. Either avoid printing them altogether or set the
*print-length* special variable.)
CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress (cl-lzma:lzma-compress *random-data*))) Evaluation took: 37.950 seconds of real time 37.883646 seconds of total run time (37.687516 user, 0.196130 system) 99.83% CPU 80,150,688,876 processor cycles 0 bytes consed
CL-USER> (time (multiple-value-call #'cl-lzma:lzma-decompress (cl-lzma:lzma-compress *random-data*))) ;; Error: Heap exhausted (no more space for allocation). ;; 757301248 bytes available, 810862240 requested. ;; ;; PROCEED WITH CAUTION. ;; [Condition of type SB-KERNEL::HEAP-EXHAUSTED-ERROR] ;; Backtrace: ;; (...) ;; 9: (SB-KERNEL:%MAKE-ARRAY (101357778) 217 6 :ELEMENT-TYPE NIL :INITIAL-ELEMENT NIL :INITIAL-CONTENTS NIL :ADJUSTABLE NIL :FILL-POINTER NIL :DISPLACED-TO NIL :DISPLACED-INDEX-OFFSET NIL) ;; 10: (CFFI:FOREIGN-ARRAY-TO-LISP #.(SB-SYS:INT-SAP #X7FCE5E1DB010) (:ARRAY :UINT8 101357778)) ;; 11: (CL-LZMA::%LZMA-COMPRESS #.(SB-SYS:INT-SAP #X7FCE5E1DB010) #.(SB-SYS:INT-SAP #X7FCE7C002410) #.(SB-SYS:INT-SAP #X7FCE760A1010) 100000000) ;; 12: (CL-LZMA:LZMA-COMPRESS #(153 48 248 229 45 236 ...))
I do not think this last result needs any additional comments. :)