You are here

Processes or threads, which approach is best?

Submitted by Peter on Fri, 2012-06-22 08:43

Most operating systems split work into processes. Some operating systems split the worth further into threads. People using operating systems without threads may tell you threads are bad. Threads are like cars, you can use a car to rush a person to hospital and you can use use a car to run over your neighbour's cat, the result is based on the use, not the technology.

Back in the oldest old days, when the scientists at Bletchley Park invented electronic computers, there were no processes or threads, just a single stream of code ripping through the machine. The Americans invented the computer and their version did the same thing. The computers ran at about the same speed as the data going in and out.

Eventually people invented disks to store data and faster computers to run more than one program at a time. They needed operating systems and the operating system needed processes to divide up the workload into chunks the operating system could run in parallel.

Within the operating system, the work was further divided into threads to let the computer work on several storage devices in parallel.

When computers appeared with multiple processor chips, and later with multiple cores in a chip, threads could be spread over multiple processes. The ultimate test of threads is to put them on a multiple core processor and see if they run together in parallel or stick to the one core.

Process Thread
Hardware protection (perhaps (depends on the type of processor)) None
Software protection (almost useless) None
High startup overhead None
Medium switching overhead None
High memory usage for small processes Almost none
One process can crash without stopping anything else (most of the time) If one thread crashes in a process, all threads crash in that process.
Application per process. The same.
Application may use multiple processes. Thread has to be in once process.
Every process can be different. All threads in one process should be the same.

Hardware protection

Some processor chips provide a way to let operating systems isolate the memory in one process from the memory in another process. Your operating system might support that type of hardware or just ignore it. If you have a good operating system on good hardware, it should be impossible for one process to see what another process is doing and impossible to see the data in the memory of the other process.

Software protection is never as safe as hardware protection.

Keep similar threads together

You get the most efficiency from threads when all the threads of a specific type are in the same process. They can share code and memory and split the processing up evenly between themselves because they all use the same resources. If there is a need for a queue or a lock on a resource, all the threads waiting on that resource are in the same process and all other processes can continue without interruption.

Keep dissimilar threads apart

Threads from different applications can stay in different processes because they will not share anything when run as threads in the one process. The one exception is the Web site containing several applications. To the end user, they are different applications. The database sees them as separate applications with different usage. To the Web server, they might all look the same because they use the same scripting language, the same type of database, and similar resources. Usually you look at splitting up Web sites when they start to fill the server and, at that point, your first split might be by application if the applications can be used independently.

Web servers

Some Web servers use only processes. Some use processes and threads. Threads work best when the underlying operating system supports threads. One of the reasons the first release of Apache version 2 worked so well on Windows was the Windows native support for threads.

Thread by name only

There are so many different software structures called threads that a definitive guide to all threads is not possible. A thread is really a thread if it can run independently on a separate core in a multiple core processor. Threads have to share data within their process but once started, they should run to the end without waiting on similar threads. The process should first gather all the data required by the thread then start the thread then collect all the data from the thread at the end of the thread and merge the data back into the overall collection of data in the process.

As an example, you might have to update ten rows in a table in a database and might start ten threads with each updating a different thread. This works when the database supports independent row updates. This approach does not work when the first thread locks the whole database, or the whole table, and all the other update threads wait in a line for the same table. The first version, with independent updates, will use all the processors in an eight core processor chip while the second version will not use more than one processor.

Nginx web server

The Nginx web server software claims to be many times faster than the Apache Web server because Nginx is event driven instead of using processes and threads. Then you read the detailed description and you find it uses processes, a small number of large threads. Then you notice that each event is handled by what is called a thread in other software. Effectively Nginx brings the threaded approach of Microsoft NT to Unix and Linux Web serving. (Microsoft copied the idea from the old Dec Vax which copied the idea from IBM mainframes. (30 years before Nginx.))

You can build threads into your application

Think of an ordinary Web page with everything delivered at once. Now think of a web page where some parts are delivered by Ajax. The Ajax version can deliver the main page in one big rush and fill in parts after the initial delivery. The first big delivery could be considered a process. When the little bits are filled in through Ajax, the little bits could be considered threads. The deciding factor is the ability of the Ajax parts to run in parallel. If the parts all wait on the same resources, they are not threads. if the parts can run independently, they are threads.

As an example, you might set up a page for a shop. On the page you might have an block of specials picked up through Ajax. You might also have a weather report picked up from the weather bureau. A bank might supply exchange rates. The exchange rates and weather are from separate servers and run independently from the shop specials, making them threads.

Best fit?

The thread model would have little use in a word processor. You could edit in one process and print a previous document in another process. Examples of thread processing in traditional word processing are almost zero.

if you were setting up a brand new word processing system with brand new documents, you might choose to create all illustrations using SVG files. Each SVG file could be rendered in a separate thread. You open the document and start browsing. The page renderer allocates space for each SVG image then starts a thread to render each SVG illustration into the relevant space.

Threads it the Web world and many other uses. There are also many uses where the old process approach is still the best fit.

Conclusion

Which is better? Using threads is better than not using threads. Not all processing fits the thread model. Web servers are among the best examples of where you would use threads.