You are here

How to make PHP based applications scale forever

Submitted by Peter on Mon, 2012-06-11 21:03

PHP based applications scale out to huge sizes and when there is a problem, it is rarely the fault of PHP, it is usually a design or configuration limit, but people with software to sell are quick to kick PHP because they cannot make money out of PHP. Dollar for dollar, you get far more from PHP than from any other language and PHP has more options for scaling upward than many of the competitors.

There are a logt of articles written about scalability of applications and the bet do the right thing, they focus on everything working together. Some articles on PHP scalability are written by people who like PHP but focus on their first major problem is if it is the bottleneck everyone crashes into. Then there are the articles written by people pushing Java who try to discredit PHP because PHP is not exactly the same as Java.

Its not Assembler

The fastest possible programming language is Assembler because it is processor specific. Using different processors created a problem many years ago when there were several different processors in use. Then Sparc died out. Apple dumped whatever they were using, which was whatever they inflicted on their customers after dumping whatever they were using before that, which was... Today Apple use exactly the same Intel processors as Dell and all the other cheap brands. Assembler could make them all sing.

C is the second fastest programming language now that everyone uses the same processor and Intel has optimised a C compiler for the Intel processor. You rarely see a difference between C and Assembler except in some places deep with an operating system.modern C compilers let you insert Assembler in C where it makes a difference.

The truth is the processor in most computers is far faster then the disks. Web sites and most other applications are slowed down by memory, disk, or the network, not the processor. PHP, with one standard configuration setting, is almost as fast as Assembler. PHP uses more processing than Assembler and C but not enough to slow down the typical application.

Its not Java

Lets kill the Its not Java talk. Both Java and PHP started as interpretive languages. PHP has almost always had an in memory compiler to make PHP fast. Java refused to use anything as modern as the PHP approach until Microsoft released a Java clone many times faster than Java. Finally Java caved in and moved into this century.

Actually not this century. In memory compilers were first used in the 1970s to make mainframe interpretive languages faster. PHP adopted the approach as a natural progression. Java was held back by the ugly politics and fashion addiction behind Java.

Java eventually chose to use an old fashion static compiler to get some slight speed advantage in some areas but that limits development speed and makes it really hard to teach Java, leading to Java developers costing twice as much as PHP developers and taking more than twice as long to develop anything. The lack of native data types in Java make it slow for processing some common types of data. Some of the design fashions in Java make Java even slower.

Given all the problems with Java, you can understand why Web site owners switch to PHP and why developers of other types of applications flock to Python and any other alternative to Java.

Compiler cache

The in memory compilation of PHP is standard and the caching of the compiled files for reuse is almost standard but some people choose to not use a cache and some Website hosting companies choose to not make a cache available. If your Web site host does not set up a PHP cache, change hosting companies because your current one is seriously deranged.

You can choose between APC and a few others. There is very little difference between the caches. Use the one supported by your hosting company.

APC is the standard the others are compared to. APC will be included as standard in PHP 6. eXcellerator is included with cPanel and is almost the same. They all provide a big jump in performance. Some provide a small number of extra options that may be of use in some Web sites.

The environment

Some articles complains about the PHP runtime environment. That is a Java view of the world. Java requires a runtime environment that is really limited and painfully incompatible. PHP is flexible in the way you run it and has far better compatibility.

One article says the problem with PHP is the default environment but PHP does not have a default environment. You can use PHP in many operating systems, there are several installation options for each operating system, plus more across all the Linux distributions, and, for Web sites, each hosting company creates a different environment.

SQL or NoSQL?

NoSQL is one of many alternatives to SQL based databases. The non SQL data storage approaches use less of some resources and more of other resources. The end result depends entirely on your data, the way you use the data, and the cost of each resource.

Disks used to be small and expensive. Databases developed many techniques for squeezing more data into a disk. Today disks are huge and cheap. You store data in different ways when disks are huge and cheap. Some databases offer data storage choices at the column level, the table level, and the database level. You can do almost anything you want then you can resort to changing things in code if you think you can do something better than the database.

By comparison, the NoSQL style approach usually offers limited storage options and you are forced to do everything in code. You then have to build a big team to go through all the data and code, something you could also do for an SQL based system. Tuning an SQL based system is often far more effective.

Some NoSQL style systems are so painful to use that people are developing SQL interfaces to patch over their NoSQL systems. The truth is the best implementations of NoSQL style systems are ones where just a small number of large database tables are devolved into NoSQL. The bulk of the database tables are left in SQL databases for flexibility. A few highly specialised tables are converted to something else and gain their speed by limiting the range of access. In a Web site with 500 database tables, the developers might convert two or three to NoSQL.

Document oriented systems

Document oriented storage systems were around before SQL. Their two limitations were the difficulty of organising the data and the difficulty of accessing the data for the most common accesses. Placing an SQL database in front of the documents solved the most common problems without blocking any of the other approaches. People were working on document and structured data oriented systems in the 1960s. SQL exploded in the 1970s because it solved so many problems.

The Web presents the same peoblems as the document oriented experiments of the 1960s. An SQL front end offers the same solutions today as it did in the 1970s. MongoDB is one of the most popular NoSQL approaches and MongoDB is quietly adding SQL features to solve the same prolems SQL solved in the 1970s.

PHP works exactly the same with documents, document oriented systems, andstructured data, as it works with SQL. Development of the PHP code is quick and easy because everything is built in. If you have a problem, it is usually a lack of understanding of the data and PHPoffers some of the best and easiest ways to discover what is in the data. PHP scripts do not have a slow horrible compilation phase, or an even more confusing make speedhump to overcome, giving you instant access and understanding.

How many users?

Popular Web sites are based on content management systems, or equivalents, built on frameworks, both public and bespoke. The choice is flexible or fast. The fastest mode is the delivery of fixed content to anonymous users through cashed content. PHP can do that and so can a standard proxy server with almost zero processing overhead.

When you deliver content to people who are not logged in and, as a result, use almost zero database activity, anything, almost everything works and most of them work fast.

The real problem is when people log in to gain current information, extra information, premium content, pointsfor their contributions, and stars against their name in the popularity ratings contests. All that information ahs to be stored, updated, and accessed fresh for every page. It does not matter which software you use, performance will be a problem. PHP gives you the most choices and is the easiest to change, which are the main two reasons PHP is the first choice ofWeb site owners when commission their second Web site.

One article says PHP is unacceptable because PHP can handle millions of users accessing a Web site but not more. All systems have problems with millions of users if the millions are actually doing anything. All system handling millions of users require extra consideration and work, it is not a PHP problem.

The database bottleneck

The database is usually the bottleneck when users do things requiring complex reads or simple updates. The NoSQL approach forces you to make reads simple and to throw away safeguards for updates. Whatever approach you use, the tradeoffs are simplicity of code, speed, and reliability of updates.

When you handle money, reliability of updates is the first priority and you need immediate writes, something unrelated to PHP because it is one or two layers of software away from PHP. What PHP gives you is the freedom to choose any one of the many data storage systems, SQL and NoSQL, plus use several of them in parallel for the optimum result.

PHP gives you the freedom to choose anything to store data. Articles criticising PHP based on examples where someone chose the wrong data storage is stupid, sort of like someone blaming Boeing because a travel agent booked them into the wrong city and the aeroplane just happened to be built by Boeing.

The Web server

There are articles proposing a change of Web server to make PHP based Websites faster. There are no articles showing significant speed increases over what you can achieve with the industry standard Apache. There are articles showing measurable speed increases if you choose to remove flexibility and many of those changes can be achieved in Apache.

Changing Web servers is usually number eleven, or lower, on the top ten list of improvements you should test.

Web server configuration

This is a big consideration. It does not matter which Web server you use. PHP gives you a heap of configuration choices to make the best use of the Web server you choose. If your choice does not workout, you have other choices.

There is not a lot of difference between the different choices when you test with a light load. If you have heaps of memory, there is not a lot of difference at heavy loads. With a few years of experience, your choice is more likely to be based on security. PHP is compatible with all the security options.

Security

Some people criticise PHP security because the PHP developers are so quick to spot security mistakes and to broadcast the security updates. Which do you prefer? The PHP approach of fixing everything quickly then telling everyone about the update or the opposition's approach of leaving security holes in place for months, or years?

The most common security holes in Web sites are created by programming styles and a lack of testing, not the programming language. PHP offers the largest choice of well tested frameworks and content management systems to get you started with a secure Web site.

Multiple servers

The first step in scaling up big time is to switch from one Web server to many. The change creates problems for databases, front end routers, and for content management systems but not for PHP. PHP can handle all the approaches you use when splitting your workload across multiple Web servers.

Multiple cities

When you have a lot of servers, you put some in every city to server your customers locally. PHP works exactly the same in every city. The problem is replicating the database, not the PHP processing. The replication is performed in the database software, not PHP.

You could, of course, switch the replication to a Web services approach and then PHP would process all the data faster than the network could carry it.

Divide and spread

Divide your Web site into sections and spread the sections over multiple locations. the most obvious is by country. You place servers in every country next to the people using the servers. People register for a specific country and use the closest server for the maximum speed.

If someone is in Australia and accesses example.com, their country code is au and their server could be au.example.com or they could access example.com/au/. In the first case, the Internet will send the request to the right server. In the second case, your router has to split the workload off to the right server. au.example.com generates the least international traffic and is the fastest. you could also register the domain example.com.au and split the workload that way.

Language is another split. if you have a small number of visitors from South America, you might split South American customers into two servers, Spanish and Portuguese, instead of one server per country.

Content is another split. A site with a lot of images might place the images in img.example.com to take some workload off example.com. Content delivery networks spread the content over many servers in many countries and change the URLs to a form like au.img.example.com.

Content delivery networks require URL rewriting and that breaks the caching of prepared Web pages. The best solution is to split your users and images into the same groups. If the user is accessing au.example.comthen their images will be from img.au.example.com and the URL can be formed before recording the page in the cache, instead of rewriting the page every time you retrieve the page from the cache.

example.com.au lets you use img.example.com.au. The important part is grouping everything so the content can be created in the correct form the first time. You might end up with some servers doing almost nothing because they have very few visitors. If you try to be really tricky to save a few servers, you end up with your main servers jumping through hoops while working through the tricky code and all that extra work will be repeated for every page on your busiest servers.

This divide and spread approach is a web site organisation issue and nothing to do with PHP. PHP just provides very easy ways of doing anything that is needed in the Web site code.

Clean code

Code errors slow down everything in every way. Processing an error requires far more resources than performing the right work the first time. You need to clean out errors and the code used to track down errors.

When everything is clean, you can start weeding out unused code and duplication. Duplicate code in memory is a tiny problem. Code that duplicates the reading of a file or database is a big problem. Clean out duplicate input/output operations first then look at the remaining code.

This type of cleanup is required in every language. PHP is easier to read and the readability makes PHP easier to clean.

Database review

People design databases then forget them. Databases need review for the same reasons you sprint clean your food cupboard. Things are pushed to the back, forgotten, then you purchase something you already have. Databases suffer the same fate.

Clean, compare, reorganise, and put the spring clean in your schedule for every spring. Every byte you save is a byte not replicated across all your servers.

Only replicate what needs replication

Big sites have lots of servers and lots of data replicated everywhere. Really big sites have people dedicated to auditing the replication and removing waste. Small to medium sites do not have the luxury of tight control over replication. You start by replicating everything except a few obvious tables. You run out of time to optimise the process further.

Reviewing the replication is another spring cleaning task to schedule every time you plan a major expansion plus every Spring.

Also look at staged replication where each replication outwards replicates only part of your database to serve part of your Web site. Each review of your replication will combine your existing knowledge with your latest research into current usage. The problems become smaller as the data grows larger.

Conclusion

Scaling web sites up and up forever is nothing to to with PHP. when you do need to change the code in your Web site for a specific scaling option, PHP provides the quickest and easiest way to to change the code, giving you the fastest way to make any changes needed for scaling upwards.