You are here

Object Oriented

Submitted by Peter on Mon, 2005-10-10 00:00

Drupal's design is considered Object Oriented by some Drupal developers while most programmers interpret the lack of use of classes to indicate Drupal is not Object Oriented. Is Drupal's current design good and could it be improved with Object Oriented code?

It's weird how minor technical issues make people decide that PHP's Object Oriented is "true OO" or not. Drupal users interpret Drupal's coding conventions as OO or FO (Function Oriented) according to biases about OO, biases about PHP, and random observations of Web site performance. There is one Drupal user's description of OO in Drupal at drupaldocs.org/api/head/file/contributions/docs/developer/topics/oop.html and a discussion of the description at drupal.org/node/19964.

OO is useful and updating Drupal to OO will be a "great leap forward" one day but a random performance failure can make people jump back to old coding habits which means a change to Drupal has to be thoroughly tested with good documentation of what works.

I wrote OO style code in PHP 3 using techniques similar to what Drupal uses now. When PHP 4 introduced classes, my code changed to using classes and the code structure changed very little because the code was already organised the way you would organise code for OO. I can see why Drupal could survive using the current emulation of classes based on function name prefixes and other coding techniques.

Drupal 6

Update 2008 Drupal 6 is based on PHP 5 and uses some classes. Nodes are passed as objects. The same for users and some other data.

Drupal 7

Update 2009 Drupal 7 uses more classes and is aimed at PHP 6.

Performance

The performance of a Web site is based on the operating system, file system, database design, database software, and a dozen hardware items. Over the years people have told me that PHP 4 is slower than PHP 3 or OO code is slower than FO code. When I investigate I find that in most cases the OO system is built completely different to the FO system. There were real problems when trying to compare Apple computers with Orange brand computers and there are equal problems comparing Web sites when there are changes as small as the change from PHP 4.1.0 to PHP 4.2.0.

PHP based Web sites are helped by code compilation cache software which is commonly called accelerator software. One brand of accelerator software is named PHP Accelerator and Zend sell an optimiser. When the accelerators began supporting PHP 4 classes, I switched to using classes. On most systems the performance of OO code was exactly the same as the equivalent OO code. The equality of the performance was helped by the fact that the code was designed the same. When other people change from FO to OO they often make the change during a complete rewrite of a system which means the code cannot be compared for performance.

Sun Solaris systems produced awful performance results for my OO code but those same Solaris systems also had real problems with regular FO because of poor file level caching. Those file level caching problems destroyed the performance of Web sites built by including lots of files whether those files contained functions or classes.

A 12000 dollar Sun Sparc server running Solaris delivered Web pages five times slower than Linux or NT on a 2000 dollar AMD based server. The Solaris system was constantly tuned by a full time Solaris professional. The Linux system was a default Linux installed but not tuned. The NT system was the slower workstation version, not the faster server version and was not tuned. The Web server software was the old Apache version 1.3 which does not use any of the performance features of NT. Nobody could explain the speed difference.

The Solaris expert said the problem was caused by PHP's OO code until we saw the dreadful results with FO code. The Linux person said the results proved Linux is superior but of course we were testing Linux on only on one processor at a time when Linux had problems with multiple processors. The Windows bigots said it was the superiority of Windows but the tests used NT, not Windows. The PHP code accelerators reduced code compilation overhead by a massive amount that was equal across operating systems. The speed difference was proportional to the number of files opened which might be a problem with the file system but at that time both Solaris and Linux used the same file system.

I moved my Web sites off Solaris and never had a problem again. Both NT and Linux produced the same performance for systems with equivalent code structure. Performance of class oriented PHP with one class per file appears to be the same as function oriented code where functions are grouped in to files containing small sets of functions. People who can run lots of parallel tests with similar systems usually report good results with OO systems if PHP, Apache, and the accelerator software are up to date. The people who get one shot with a new OO Web site on an old system are the ones who most often experience a shock and never get to find the real cause of the performance problem.

DrOOpal performance should be exactly the same as the current Drupal. The DrOOpal guidelines could minimise code fragmentation and help align the code divisions based on the way code is used. The guidelines would work equally for FO and OO code. I developed one improvement for one of my modules by having the database reads and common code in a FO file that was loaded for all pages. The database updates and administration code was placed in a separate FO file that is loaded only when in the administration section of a Drupal based site. The common FO code was later changed to OO when the FO administration module was left as FO.

The performance of complex Drupal sites could be improved if larger modules were split so that module xxxx becomes module xxxx plus module xxxx_admin. The bulk of the code in module xxxx_admin would be loaded only occasionally. This change of design can be applied equally to FO and OO code.

Web page delivery will slow down if all code is included for every page even when not used. That is true of Function Oriented design as well as OO. The "all includes will be at the top of the code" approach is enforced equally in FO and OO systems. I delivered one OO system with conditional inclusion of code. The recipient demanded that all includes be unconditional and performed at the start of the code. Performance went down hill faster than Harry Egger.

After I moved on to another project the recipient started replacing classes with raw functions and each change broke the world record for the slowest code. I had a similar experience at another site where politics overrode good development practices. Conditional code inclusion and results caching have benefits that are nothing to do with the decision to use FO or OO.

Classy Code

In my 2001 Code Odyssey workshops I taught people to use functions first and classes second. By 2003 I was teaching classes first because PHP 4 was working as well or better with classes than with functions. The organisational thought behind classes made the code safer which reduced development time and redundant code which in turn reduced compilation time per page. With a very small tweak to some classes, the classes worked the same in PHP 4 and PHP 5. I cannot find a design reason to not use classes in Drupal along side the current stuff.

When teaching the use of functions back the 1980s (I was the kid they hired last week to train the kids they hired this week) I taught separation of code and data plus a whole lot of other things proven to make systems reliable, available, and serviceable. The first attempt at writing OO code usually mixes data with code. The next attempt at OO code results in experiments with a mixture of data oriented objects and code oriented objects. It is funny how OO code is supposed to cure the problems of earlier code but then starts to copy the best design practices of the older code because those practices reduce the maintenance headaches you get with OO code.

Good design practices are worth more than OO code but OO code makes some bad design practices more obvious. Drupal has some good design practices around code but data access is less strict. Good design practices include the idea of someone owning a database table and having all maintenance through common code, either FO or OO code. That type of structure would be good for Drupal and is more obvious in OO code.

The Drupal database is held up as an example of good design achieving much of what OO achieves. The Drupal database is modular but the lack of transactions creates update errors. If the data was accessed through OO code then the data could be owned by objects and the objects could apply integrity rules including transactions. While the data exists outside of objects the data is open to uncontrolled updates that preclude transactions. Adding transactions to Drupal will be a nightmare and a conversion from FO to OO will make the addition of transactions easier.

PHP 4 Compatibility

One of my Drupal sites uses OO code. The OO part fits in Drupal without conflict. The code is currently limited to PHP 5 because the code uses one PHP extension that is only in PHP 5. The equivalent PHP 4 extension underwent a major change at about PHP 4.2.0 which means that people using that functionality have to write two different code sets just for PHP 4. When will Drupal set PHP 4.2.0 or 4.3.0 as a minimum? Will MySQL 4.0 or 4.1 become a minimum with the subsequent requirement for mysqli instead of mysql? The decision to set minimum standards for software means that you end up with design considerations to help you cope with multiple versions of the supported code before you decide on FO or OO. OO is just a simpler way to handle the multiple code sets.

Modular, Not OO

Drupal modules and themes are held up as part of the OO philosophy in Drupal. Modules fit that description but not themes. Drupal lets you use a number of theme engines with each theme engine implementing a different way of applying themes. Some of the theme engines are as OO oriented as the litter at the bottom of a rat's nest.

You could build an OO theme engine and add an XSLT based theme using the OO XSL feature of PHP 5 and still not get an OO result. XSL lets you write XSLT that uses an OO approach and lets you write XSLT that is FO or even 1833 style spaghetti code. (1833 was when Lady Ada Lovelace invented computer programming.)

GTK adds OO to the C language by expanding functions with a control protocol that controls both the functions and the data. OO is possible without classes. The Drupal modular approach combined with a structured naming system for functions does add an OO flavour to Drupal's FO approach but the lack of controls on data prevent Drupal from being OO.

Abstraction

Drupal has a system of function calls named hooks and some might say that the hooks provide a level of abstraction. There is code to test if a hook exists but that is no different to making your code test if a method exists in an object. OO arrives when you access all data through methods so that invalid data does not pass from one part of the code to another. Drupal is not OO because has too many examples of data accessed by bypassing Drupal's functions. To make the current Drupal OO you would have to replace most of the SQL with calls to functions that provide data.

Use Drupal's taxonomy module as an example. If the taxonomy module owns the taxonomy tables then all access to the taxonomy tables should be through functions provided by the taxonomy module. The person maintaining the taxonomy module would have to provide functions to satisfy all access to the taxonomy tables. Other code would have to replace existing SQL queries with taxonomy functions. One of my modules joins a taxonomy table to one of my tables. Joining two tables across two Drupal modules is not defined anywhere I can see. In OO code the equivalent is equally hard to build but it is easier to spot the problem.

Encapsulation

Encapsulation is a way of dividing up a system in to chunks of code and data. Drupal has a way to divide up code into chunks and a way to divide up data into matching chunks. The lack of functions in the code chunks leads to people crossing the data borders with generic SQL queries. OO makes the problems more observable. To make the division complete would require a complete ban on the use of database queries in themes and lots of work to make the modules provide the functions to provide the data. When data is required from multiple modules, the modules would have to talk instead of the current system of joining tables at the theme generation level.

My Pet system is working toward encapsulation and eventually transactions. The Pet system provides data to themes, not SQL access. PHP does not provide public read only access to object attributes so I provide all access through methods. The code becomes a little longer when used in a theme but is far shorter and safer than a raw SQL query.

Polymorphism

Polymorphism means your code can do one thing when fed fish and something different when fed grass. PHP code can transform any sort of input to any other type of input but the results are often unpredictable, incorrect, or both. Polymorphism requires stronger controls on the type of data we use so that code can then react based on both value and type.

Drupal functions can react to data in ways that OO code might react. Few Drupal functions inspect data for accuracy or type and depend on PHP's automatic conversion of data type. The system is not sufficiently accurate to avoid the errors constantly reported in Drupal forums. If the current Drupal functions became methods in objects then you would not achieve polymorphism.

PHP has an error reporting system that lets you suppress important notices with a setting of ~E_NOTICE. Drupal 4.6.0 is finally stamping out the use of ~E_NOTICE and fixing the errors created by the dangerous assumptions that go with the use of ~E_NOTICE. That is one step toward polymorphism. The next step is to implement a controlled use of boolean values so that the automatic and inaccurate conversion of non boolean values is replaced with a clearly defined use. Functions should test for true and false before risking a conversion so that people can switch to passing boolean values. The minimum standard for PHP 4 should be at least the release that delivered true boolean values and tests.

You then need to test input data for objects so that data is requested from methods when the input is an object. If you are expecting a string and receive an object then request the string from the object through a standard defined method. People can then gradually switch to passing objects around in Drupal without fear of incorrect processing by Drupal functions.

Design Patterns

Design Patterns is a programming fashion that infects recent OO talk. The earliest OO design patterns I can find were implemented in the 1960s using the assembly languages that preceded C and Cobol. Subsequent work depended on the functionality of the underlying files or database. Some of the current design pattern usage is an attempt to replace the work of a good relational database with code that produces the same result with far less optimisation.

I prefer to design code using design principals that were proved when the major goal was reliability and service. I associate the current design pattern fashion with incredibly slow Java based applications build by college students after reading an obtuse textbook. After those students have a few years of programming experience they are happy to remove a layer of data abstraction and make use of the best features a database can offer.

Good OO design lets you use and remove data abstraction layers according to the features offered by your data store. Drupal should use transactions but does not. If Drupal's database abstraction layer was more like ADOdb then Drupal could instantly work with PostgreSQL and Oracle, a feature far more useful than implementing a fashion from a textbook.

We should be thinking of Drupal in design layers instead of design patterns and working out how to make the rest of Drupal work with the database layer so that the database layer can change the database access code for a new database. The change would remove SQL from themes and would require all the layers in between to increase the variety of data they deliver.