CScheme overview

Giorgio Scorzelli
CADGroup
Università Of Roma Tre, Rome (Italy)

The integration of geometric kernels in a functional language

CScheme is an application developed at "Università of Roma Tre" (Rome, Italy) by CADGroup based on the integration of two open-source software projects:

(a) The CAD environment namedOpen Cascade (OCAS) by Matra Datavision, which is a powerful geometric and graphics environment mainly written in C++.

(b) The MzScheme compiler, which is a first-class platform-independent implementation of the functional language Scheme by Programming Language Team (PLT) of Rice University.

The necessity of such integration arises from the well-known lack of specialized libraries for functional languages, mainly in the area of graphics rendering and interaction. Besides their theoretical elegance and even their practical efficiency and the ability to provide high-level tools for rapid application development and deployment, functional languages often suffer this problem. It is too often necessary to link external libraries written in some imperative language, say C++, and this is always a cumbersome ask, say heavy, slow and inefficient, for the application programmer.

Hence one of the main goals of the CScheme project was to reduce and possibly to eliminate such an inefficiency, by providing a mostly automatic interface between the C++ libraries of Open Cascade and the PLT Scheme environment.

Open Cascade Overview

Open Cascade is an object-oriented development environment fort he implementation of CAD applications, recently released as Open Source by Matra Datavision.

The classes of Open Cascade offer he infrastructure (Rapid Application Framework) for rapid development of geometric computing applications oriented or “design ”in some specific areas of interest, say in advanced CAD tools, design databases, simulation systems or graphics rendering of complex assemblies.

Whereas Open Cascade is a software environment specifically oriented for geometric modelling, it also gives other non-geometric services, aiming to support the application programmer along all the life-cycle of the application. The non-geometric services may be classified into three main subsystems:

[CDL] The Component Definition Language is the Open Cascade language for defining the interface of software components. A CDL file establishes the internal structure of C++ programs and data. For example a CDL class describes class constructors, instance methods and state variables.

[OCAF] The Open Cascade Application Framework provides modelling services, aiming to connect user-data, even non-geometric, to parametric geometric models. It also provides further functionalities for storing and editing the history of a work session, and not only its results. This approach is embedded into an automatic mechanism for document and application generation, called application template.

[WOK] The Workshop Organization Kit is the set of tools for the development of CDL based applications. In particular it contains: a system shell for command invocation; tools for code generation through suitable program extractors; commands for automatic generation of project Makefile and library compilation. We are mainly interested on the Component Definition Language with respect to our goal of giving access to OCAF modelling functionalities from a Scheme environment.






A this purpose, it may be useful to shortly recall the principal aspects of the development of a software component in Open Cascade, which is constituted by the following steps:

1. Writing of CDL files.

2. Automatic translation of CDL files into dynamic data structures. A specialized parser is in charge of CDL translation, and provides suitable memory data structures which is possible to query at run time.

3. Automatic generation of header files: an extractor program is devoted to query the previously generated data structures, in order to generate the header files of classes.

4. Writing of the implementation of the class methods by the application programmer. In such a step it is required the strict use of prototypes automatically generated by the extractors described above.


MzScheme overview

Scheme is a well-known Lisp-like dialect. It can be considered as a simplified version of Lisp (probably Scheme experts will not agree) that maintains all the power of a functional language. Scheme is a relatively small language that provides constructs at the core of Lisp, adding lexical scope and true first-class functions.

If you want to get a precise view of Scheme you should read the "Revised Report on the Algorithmic Language Scheme", a document that fully describes the syntax and semantics of the Scheme programming language in very few pages.

MzScheme is a very stable first-class platform-independent public-domain Scheme implementation from Rice University. It is available for many platforms (such as Windows, Linux and Mac) and it offers some very useful (and not standard) additional services such as XML parsing and an object system that I'm using in my project.

The problem of integration

I am now focusing the attention on the main problem about the integration between two or more programming languages.It is probably not necessary to explain and to describes differences between imperative (or procedural) languages and functional (or applicative) languages but I'm giving a shortly resume.
The first uses concept and primitives strictly connected with the hardware of the computer; for example variables could be considered as a sort of abstraction of a memory cell whereas an assignment operation is concerned with a transfer of some data value; all programs, either simple or complex, are an ordered collection of instructions to execute sequentially and the calculus evolution is "represented" by a variation of memory status. On the contrary functional languages defines totally new concepts at a higher level: a program determinate the execution of some function which returns a value (and there is not the need to modify the internal value of a variable at all). Functional languages kernel is composed by: the function application mechanism, recursion and conditional construct as tools for execution control; list as primitive data type.

It is widely known that functional languages offers many and very simple services which facilitate application development and which could give to the application itself a very good internal structure in terms of modularization and readability of source code. But at the same time it is widely known that functional languages are not used (and even taken in consideration) to develop commercial or big application where programmers usually prefer imperative languages and imperative tools (say C++, Java, Delphi and so on).

The main reasons for this kind of situation are resumed in Philip Wadler's article, "Why no one uses functional languages", published on "SIGPLAN Notices" (I am only remarking the most important arguments by CScheme point of view):

[Compatibility] Computing has matured to the point where systems are often assembled from components rather than built from scratch. Many of these components are written in C or C++, so a foreign function interface to C is essential, and interfaces to other languages can be useful. The isolationist nature of functional languages is beginning to give way to a spirit of open interchange. Serious implementations now routinely provide interfaces to C, and sometimes other languages.

[Libraries] The fashionable idea of software reuse has been around for ages in the form of software libraries. A good library can make or break a language. Considerable effort has been extended on developing graphic user interface libraries for functional languages.

[Portability] There are numerous projects where C won out over a functional language, not because C runs faster (although often it does), but because the hegemony of C guarantees that it is widely portable.

[Availability] Even when a functional language has been ported to the machine and operating system at hand, it may not be easy to use. [...] An additional problem arises because functional languages are often under active development, creating tension between the needs of stability and research.

[Tools] To be usable, a language system must be accompanied by a debugger and a profiler. Just as with interlanguage working, designing such tools is straightforward for strict languages, but trickier for lazy languages.

My CScheme application focuses more on the "compatibility" and "libraries" problems by implementing some new software components (grouped under the name of  Foreign Function Interface) which alleviate the "isolationist nature of functional language" by the automatic integration of C++ libraries under a Scheme shell.

Other problems ("portability", "availability", "tools") I am sure that are completely solved by PLT MzScheme implementation which offers a very complete Integrated Development Environment (IDE) by which all the development process is as productive as in any other imperative-based tools.

The foreign function interface (FFI)

To fully understand operations that a FFI should perform it is necessary to explain some basic notions about C++ compilation. C++ software cycle-life is the following:

[1] The writing of source code: header files that describe software interface (so just focusing on "exposed" services) and real implementation files.

[2] Object file generation by a C++ compiler

[3] Executable (or library) generation by a C++ linker through object file composition.

The most important work is performed by the compiler at first step: it is the compiler which allocate and deallocate memory region; it is the compiler which checks the correctness of the assignment operations eventually converting actual parameters (through four types of conversion, in order: exact, promotion, standard and user conversions); it is finally the compiler which solves overloading calls by finding the method which best matches actual arguments to formal arguments.

It's important to note that all operations could be executed thanks to the knowledge of C++ data structures by header files parsing. All this knowledge will be irremediably lost during the linking process (step 3) . In fact it is not  possible to introspect C++ internal composition of data types at execution-time (as we can do for example in Java language) .
 
 




In my CScheme software all C++ methods are invoked and executed inside the functional interpreter by the application of "dynamic" arguments that are "constructed" by the functional user. Every user could make a call to a C++ foreign function and, respect to this type of call, no static object file is generated.

Interpreter (through the FFI) will invoke functions: it will load arguments in a sort of simulated stack; it will communicate returned value to the functional shell; it will verify the matching of actual and formal parameters; and finally it will solve overloading calls. It is possible to consider it as a sort of a "dynamic compiler".
Whereas the "standard" compiler obtains knowledge about data types through header files, my dynamic compiler obtains the same knowledge by Ocas CDL file parsing: the real difference is that the latter uses it at execution time.

This kind of dynamic services is not limited only to work with Open Cascade C++ libraries; we can generalize (and CScheme actually do) this approach to get every C++ library with a CDL description.

In the following HTML pages I will focus the attention on each of the most important component which constitutes the CScheme architecture and I will detail some implementation aspects