Back to description
Operating systems are not only regarded as a fascinating part of information technology, but are also the subject of controversial... more
Operating systems are not only regarded as a fascinating part of information technology, but are also the subject of controversial discussion among a wide public.1 Linux has played a majorrole in this development. Whereas just 10 years ago a strict distinction was made between relatively simple academic systems available in source code and commercial variants with varying performance capabilities whose sources were a well-guarded secret, nowadays anybody can download the sources of Linux (or of any other free systems) from the Internet in order to study them.
Linux is now installed on millions of systems and is used by home users and professionals alike for a wide range of tasks. Fromminiature embedded systems in wristwatches to massively parallel mainframes, there are countless ways of exploiting Linux productively. And this makes the sources so interesting. A sound, well-established concept (Unix) melded with powerful innovations and a strong penchant for dealing with problems that do not arise in academic teaching systems – this is what makes Linux so fascinating.
This book describes the central functions of the kernel, explains its underlying structures, and examines its implementation. Because complex subjects are discussed, I assume that the reader already has some experience in operating systems and systems programming in C (it goes without saying that I assume some familiarity with using Linux systems). I touch briefly on several general concepts relevant to common operating system problems, but my prime focus is on the implementation of the Linux kernel. Readers unfamiliar with a particular topic will find explanations on relevant basics in one of the many general texts on operating systems; for example, in Tanenbaum’s outstanding introductions ([TW06] and [Tan07]). A solid foundation of C programming is required. Because the kernel makes use of many advanced techniques of C and, above all, of many special features of the GNU C compiler, Appendix C discusses the finer points of C with which even good programmers may not be familiar. A basic knowledge of computer structures will be useful as Linux necessarily interacts very directly with system hardware – particularly with the CPU. There are also a large number of introductory works dealing with this subject; some are listed in the reference section. When I deal with CPUs in greater depth (in most cases I take the IA-32 or AMD64 architecture as an example because Linux is used predominantly on these system architectures), I explain the relevant hardware details. When I discuss mechanisms that are not ubiquitous in daily live, I will explain the general concept behind them, but expect that readers will also consult the quoted manual pages for more advice on how a particular feature is used from userspace.
The present chapter is designed to provide an overview of the various areas of the kernel and to illustrate their fundamental relationships before moving on to lengthier descriptions of the subsystems in the following chapters.
Since the kernel evolves quickly, one question that naturally comes to mind is which version is covered in this book. I have chosen kernel 2:6:24 as basis, which was released at the end of January 2008. The dynamic nature of kernel development implies that a new kernel version will be available by the time you read this, and naturally, some details will have changed – this is unavoidable. If it were not the case, Linux would be a dead and boring system, and chances are that you would not want to read the book in this case. While some of the details will have changed, concepts will not have varied essentially. This is particularly true because 2:6:24 has seen some very fundamental changes as compared to earlier versions. Developers do not rip out such things overnight, naturally.
... less
All modern operating systems are able to run several processes at the same time – at least, this is the impression users... more
All modern operating systems are able to run several processes at the same time – at least, this is the impression users get. If a system has only one processor, only one program can run on it at a given time. In multiprocessor systems, the number of processes that can truly run in parallel is determined by the number of physical CPUs.
The kernel and the processor create the illusion of multitasking – the ability to perform several operations in parallel – by switching repeatedly between the different applications running on the system at very rapid intervals. Because the switching intervals are so short, users do not notice the intervening brief periods of inactivity and gain the impression that the computer is actually doing several things at once.
This kind of system management gives rise to several issues that the kernel must resolve, the most important of which are listed below.
Applications must not interfere with each other unless this is expressly desired. For example, an error in application A must not be propagated to application B. Because Linux is a multiuser system, it must also be ensured that programs are not able to read or modify the memory contents of other programs – otherwise, it would be extremely easy to access the private data of other users.
CPU time must be shared as fairly as possible between the various applications, whereby some programs are regarded as more important than others.
I deal with the first requirement – memory protection – in Chapter 3. In the present chapter, I focus my attention on the methods employed by the kernel to share CPU time and to switch between processes. This twofold task is split into two parts that are performed relatively independently of each other.
The kernel must decide how much time to devote to each process and when to switch to the next process. This begs the question as to which process is actually the next. Decisions of this kind are not platform-dependent.
When the kernel switches from process A to process B, it must ensure that the execution environment of B is exactly the same as when it last withdrew processor resources. For example, the contents of the processor registers and the structure of virtual address space must be identical. This latter task is extremely dependent on processor type. It cannot be implemented with C only, but requires help by pure assembler portions.
Both tasks are the responsibility of a kernel subsystem referred to as the scheduler. How CPU time is allocated is determined by the scheduler policy, which is totally separate from the task switching mechanism needed to switch between processes.
Memory management is one of the most complex and at the same time most important parts of the kernel. It is characterized... more
Memory management is one of the most complex and at the same time most important parts of the kernel. It is characterized by the strong need for cooperation between the processor and the kernel because the tasks to be performed require them to collaborate very closely. Chapter 1 provided a brief overview of the various techniques and abstractions used by the kernel in the implementation of memory management. This chapter examines the technical aspects of implementation in detail.
The virtual address space of userland processes is an important abstraction of Linux: It allows the same view of the system... more
The virtual address space of userland processes is an important abstraction of Linux: It allows the same view of the system to each running process, and this makes it possible for multiple processes to run simultaneously without interfering with the memory contents of the others. Additionally, it allows various advanced programming techniques like memory mappings. In this chapter, I will discuss how these concepts are realized in the kernel. This also requires an examination of the connection between page frames of the available physical RAM and pages in all virtual process address spaces: The reverse mapping technique helps to track which virtual pages are backed by which physical page, and page fault handling allows filling the virtual address space with data from block devices on demand.
As a multitasking system, Linux is able to run several processes at the same time. Normally, the individual processes must... more
As a multitasking system, Linux is able to run several processes at the same time. Normally, the individual processes must be kept as separate as possible so that they do not interfere with each other. This is essential to protect data and to ensure system stability. However, there are situations in which applications must communicate with each other; for example,
when data generated by one process are transferred to another.
when data are shared.
when processes are forced to wait for each other.
when resource usage needs to be coordinated.
These situations are handled using several classic techniques that were introduced in System V and have since proven their worth, so much so that they are now part and parcel of Linux. Because not only userspace applications but also the kernel itself are faced with such situations – particularly on multiprocessor systems – various kernel-internal mechanisms are in place to handle them.
If several processes share a resource, they can easily interfere with each other – and this must be prevented. The kernel therefore provides mechanisms not only for sharing data but also for coordinating access to data. Again, the kernel employs mechanisms adopted from System V.
Resources need to be protected not only in userspace applications but especially in the kernel itself. On SMP systems, the individual CPUs may be in kernel mode at the same time and, theoretically, may want to manipulate all existing data structures. To prevent the CPUs from getting into each other’s way, it is necessary to protect some kernel areas by means of locks; these ensure that access is restricted to one CPU at a time.
Device drivers are a key area of the kernel as many users judge operating system performance primarily by the number of peripherals... more
Device drivers are a key area of the kernel as many users judge operating system performance primarily by the number of peripherals for which drivers are available and how effectively they are supported. Consequently, large parts of the kernel sources are devoted to the implementation of device drivers.
Device drivers build on many different mechanisms provided by the central kernel (this is why drivers are sometimes referred to as kernel “applications”). The immense number of drivers in the Linux kernel means that it is impossible to discuss all (or even a few) in detail. Fortunately, this is not necessary. The structures of the drivers are generally very similar – regardless of device – so that in this chapter we need only discuss a few key aspects common to all drivers. Since the objective of this book is to cover all important parts of the kernel, this chapter omits some of the more specific points of driver writing which would require a book of its own. However, two books that focus solely on driver writing are currently available. The classic text in this area is Linux Device Drivers by Corbet et al.[CRKH05]. We can recommend it wholeheartedly to anyone interested in or charged with writing a device driver. A recent addition to kernel hackers’ bookshelves is Essential Linux Device Drivers by Venkateswaran [Ven08]. Developers who are able to read German will certainly also enjoy Linux Gerätetreiber by Quade and Kunst [QK06]. The quoted references are complementary to this book. Here, we document how the kernel sets up and manages data structures and generic infrastructure for device drivers. Also, we discuss routines that are provided to support device drivers. Device driver books, on the other hand, focus on how to use these routines to actually create new drivers, but are not so much interested in how the underlying foundations are implemented.
Modules are an efficient way of adding device drivers, filesystems and other components dynamically into the Linux kernel... more
Modules are an efficient way of adding device drivers, filesystems and other components dynamically into the Linux kernel without having to build a new kernel or reboot the system. They remove many of the restrictions constantly raised as arguments against monolithic architectures by, above all, micro-kernel proponents. These arguments concern primarily the lack of dynamic extensibility. In this chapter, we examine how the kernel interacts with the modules; in other words, how they are loaded and unloaded and how the kernel detects the interdependencies between various modules. It is therefore necessary to deal in some detail with the structure of module binary files (and their ELF structure).
Typically, a full Linux system consists of somewhere between several thousand and a few million files that store programs... more
Typically, a full Linux system consists of somewhere between several thousand and a few million files that store programs, data, and all kinds of information. Hierarchical directory structures are used to catalog and group files together. Various approaches are adopted to permanently store the required structures and data.
Every operating systemhas at least one “standard filesystem” that features functions, some good, some less so, to carry out required tasks reliably and efficiently. The Second/Third Extended Filesystem that comes with Linux is a kind of standard filesystem that has proved itself to be very robust and suitable for everyday use over the past few years. Nevertheless, there are other filesystems written for or ported to Linux, all of which are acceptable alternatives to the Ext2 standard. Of course, this does not mean that programmers must apply different file access methods for each filesystem they use – this would run totally counter to the concept of an operating system as an abstraction mechanism.
To support various native filesystems and, at the same time, to allow access to files of other operating systems, the Linux kernel includes a layer between user processes (or the standard library) and the filesystem implementation. This layer is known as the Virtual File System, or VFS for short.
The task of VFS is not a simple one. On the one hand, it is intended to provide uniform ways of manipulating files, directories, and other objects. On the other, itmust be able to come to terms with the concrete implementations of the various approaches, which differ in part not only in specific details but also in their overall design. However, the rewards are high because VFS adds substantial flexibility to the Linux kernel.
The kernel supports more than 40 filesystems of various origins – ranging from the FAT filesystem from the MS-DOS era through UFS (Berkeley Unix) and iso9660 for CD-ROMs to network filesystems such as coda and NFS and virtual versions such as procfs.
The structure and layout of the interfaces and data structures of the Virtual Filesystem discussed in Chapter 8 define a... more
The structure and layout of the interfaces and data structures of the Virtual Filesystem discussed in Chapter 8 define a framework within which filesystem implementations must operate. However, this does not dictate that the same ideas, approaches, and concepts must be adopted by every filesystem when files are organized on block devices to store their contents permanently. Quite the opposite: Linux supports a wide variety of concepts including those that are easy to implement and understand but are not particularly powerful (e.g., the Minix filesystem); the proven Ext2 filesystem, which is used bymillions; specific versions designed to support RAM- and ROM-based approaches; highly available cluster filesystems; and modern, tree-based filesystems with rapid restoration of consistency by means of transaction journals. No other operating system offers this versatility.
The techniques used differ considerably even though they can all be addressed – from both the user and kernel sides – via an identical interface, thanks to the virtual filesystem. Because of the large number of filesystems supported, every single implementation cannot be discussed here – not even briefly. Instead, this chapter focuses on the extended filesystem family, that is, the Ext2 and Ext3 filesystems. They illustrate the key concepts underlying the development of filesystems.
Traditionally, filesystems are used to store data persistently on block devices. However, it is also possible to use filesystems... more
Traditionally, filesystems are used to store data persistently on block devices. However, it is also possible to use filesystems to organize, present, and exchange information that is not stored on block devices, but dynamically generated by the kernel. This chapter examines some of them:
The proc filesystem enables the kernel to generate information on the state and configuration of the system. This information can be read from normal files by users and system programs without the need for special tools for communication with the kernel; in some cases, a simple !" is sufficient. Data can not only be read from the kernel, but also sent to it by writing character strings to a file of the proc filesystem. echo “value” > /proc/file – there’s no easier way of transferring information from userspace to the kernel. This approach makes use of a virtual filesystem that generates file information “on the fly,” in other words, only when requested to do by read operations. A dedicated hard disk partition or some other block storage device is not needed with filesystems of this type. In addition to the proc filesystem, the kernel provides many other virtual filesystems for various purposes, for example, for the management of all devices and system resources cataloged in the form of files in hierarchically structured directories. Even device drivers can make status information available in virtual filesystems, the USB subsystem being one such example.
Sysfs is one particularly important example of another virtual filesystem that serves a similar purpose to procfs on the one hand, but is rather different to on the other hand. Sysfs is, per convention, always mounted at -010, but there is nothing that would prevent including it in other places. It was designed to export information from the kernel into userland at a highly structured level. In contrast to procfs, it was not designed for direct human use because the information is deeply and hierarchically nested. Additionally, the files do not always contain information in ASCII text form, but may well use unreadable binary strings. The filesystem is, however, very useful for tools that want to gather detailed information about the hardware present in a system and the topological connection between the devices. It is also possible to create sysfs entries for kernel objects that use kobjects (see Chapter 1 for more information) with little effort. This gives userland easy access to important core kernel data structures.
Small filesystems that serve a specific purpose can be constructed from standard functions supplied by the kernel. The in-kernel library that provides the required functions is called libfs. Additionally, the kernel provides means to implement sequential files with ease. Both techniques are put together in the debugging filesystem debugfs, which allows kernel developers to quickly export values to and import values from userland without the hassle of having to create custom interfaces or special-purpose filesystems.
Many filesystems provide features that extend the standard functionality offered by the VFS layer. It is impossible for the... more
Many filesystems provide features that extend the standard functionality offered by the VFS layer. It is impossible for the virtual filesystem to provide specific data structures for every feature that can be imagined – fortunately, there’s lots of room in our imagination, and developers are not exactly short of new ideas. Additional features that go beyond the standard Unix file model often require an extended set of attributes associated with every filesystem object. What the kernel can provide, however, is a framework that allows filesystem-specific extensions. Extended attributes (xattrs) are (more or less) arbitrary attributes that can be associated with a file. Since usually every file will posses only a subset of all possible extended attributes, the attributes are stored outside the regular inode data structure to avoid increasingits size in memory, and wasting disk space. This allows a really generic set of attributes without any significant impact on filesystem performance or disk space requirements.
One use of extended attributes is the implementation of access control lists that extend the Unix-style permission model: They allow implementation of finer-grained access rights by not only using the concept of the classes user, group, and others, but also by associating an explicit list of users and their allowed operations on the file. Such lists fit naturally into the extended attribute model. Another use of extended attributes is to provide labeling information for SELinux.
That Linux is a child of the Internet is beyond contention. Thanks, above all, to Internet communication, the development... more
That Linux is a child of the Internet is beyond contention. Thanks, above all, to Internet communication, the development of Linux has demonstrated the absurdity of the widely held opinion that project management by globally dispersed groups of programmers is not possible. Since the first kernel sources were made available on an ftp server more than a decade ago, networks have always been the central backbone for data exchange, for the development of concepts and code, and for the elimination of kernel errors. The kernel mailing list is a living example that nothing has changed. Everybody is able to read the latest contributions and add their own opinions to promote Linux development – assuming, of course, that the opinion expressed are reasonable.
Linux has a very cozy relationship with networks of all kinds – understandably as it came of age with the Internet. Computers running Linux account for a large proportion of the servers that build the Internet. Unsurprisingly, network implementation is a key kernel component to which more and more attention is being paid. In fact, there are very few network options that are not supported by Linux.
Implementation of network functionality is one of the most complex and extensive parts of the kernel. In addition to classic Internet protocols such as TCP, UDP, and the associated IP transport mechanism, Linux also supports many other interconnection options so that all conceivable types of computers and operating systems are able to interoperate. The work of the kernel is not made any simpler by the fact that Linux also supports a gigantic hardware spectrum dedicated to data transfer – ranging from Ethernet cards and token ring adapters to ISDN cards and modems.
Nevertheless, Linux developers have been able to come up with a surprisingly well-structured model to unify very different approaches. Even though this chapter is one of the longest in the book, it makes no claim to cover every detail of network implementation. Even an outline description of all drivers and protocols is beyond the scope of a single book – many would be needed owing to the volume of information. Not counting device drivers for network cards, the C implementation of the network layer occupies 15 MiB in the kernel sources, and this equates to more than 6,000 printed pages of code. The shear number of header files that relate to networking has motivated the kernel developers to store them not in the standard location include/linux, but devote the special directory include/net to them. Embedded in this code are many concepts that form the logical backbone of the network subsystem, and it is these that interest us in this chapter. Our discussion is restricted mainly to the TCP/IP implementation because it is by far the most widely used network protocol.
Of course, development of the network layer did not start with a clean sheet. Standards and conventions for exchanging data between computers had already existed for decades and were well known and well established. Linux also implements these standards to link to other computers.
In the view of user programs, the kernel is a transparent system layer – it is always present but never really noticed. Processes... more
In the view of user programs, the kernel is a transparent system layer – it is always present but never really noticed. Processes don’t know whether the kernel is running or not. Neither do they know which virtual memory contents are currently in RAM or which contents have been swapped out or perhaps not even read in. Nevertheless, processes are engaged in permanent interaction with the kernel to request system resources, access peripherals, communicate with other processes, read in files, and much more. For these purposes, they use standard library routines that, in turn, invoke kernel functions – ultimately, the kernel is responsible for sharing resources and services fairly and, above all, smoothly between requesting processes.
Applications therefore see the kernel as a large collection of routines that perform a wide variety of system functions. The standard library is an intermediate layer to standardize and simplify the management of kernel routines across different architectures and systems.
In the view of the kernel, the situation is, of course, a bit more complicated especially as there are several major differences between user and kernel mode, some of which were discussed in earlier chapters. Of particular note are the different virtual address spaces of the two modes and the different ways of exploiting various processor features. Also of interest is how control is transferred backward and forward between applications and the kernel, and how parameters and return values are passed. This chapter discusses such questions.
As described in previous chapters, system calls are used to invoke kernel routines from within user applications in order to exploit the special capabilities of the kernel. We have already examined the implementation of a number of system calls from a wide range of kernel subsystems.
First, let’s take a brief look at system programming to distinguish clearly between library routines of the standard library and the corresponding system calls. We then closely examine the kernel sources in order to describe the mechanism for switching from userspace to kernel space. The infrastructure used to implement system calls is described, and special implementation features are discussed.
Chapter 13 demonstrated that system execution time can be split into two large and different parts: kernel mode and user... more
Chapter 13 demonstrated that system execution time can be split into two large and different parts: kernel mode and user mode. In this chapter, we investigate the various kernel activities and reach the conclusion that a finer-grained differentiation is required.
System calls are not the only way of switching between user and system mode. As is evident from the preceding chapters, all platforms supported by Linux employ the concept of interrupts to introduce periodic interruptions for a variety of reasons. Two types of interrupt are distinguished:
Hardware InterruptsAre produced automatically by the system and connected peripherals. They support more efficient implementation of device drivers, but are also needed by the processor itself to draw attention to exceptions or errors that require interaction with the kernel code.
SoftIRQsAre used to effectively implement deferred activities in the kernel itself.
In contrast to other parts of the kernel, the code for handling interrupts and system call-specific segments contains very strong interleaving between assembly language and C code to resolve several subtle problems that C could not reasonably handle on its own. This is not a Linux-specific problem. Regardless of their individual approach, most operating system developers try to hide the low-level handling of such points as deeply as possible in the kernel sources to make them invisible to the remaining code. Because of technical circumstances, this is not always possible, but the interrupt handling layer has evolved over time to a state where high-level code and low-level hardware interaction are separated as well and cleanly as possible.
Frequently, the kernel needs mechanisms to defer activities until a certain time in the future or to place them in a queue for later processing when time is available. You have come across a number of uses for such mechanisms in earlier chapters. In this section, we take a closer look at their implementation.
All the methods of deferring work to a future point in time discussed in this book so far do not cover one specific area... more
All the methods of deferring work to a future point in time discussed in this book so far do not cover one specific area – the time-based deferral of tasks. The different variants that have been discussed do, of course, give some indication of when a deferred task will be executed (e.g., tasklets when handling softIRQs), but it is not possible to specify an exact time or a time interval after which a deferred activity will be performed by the kernel. The simplest kind of usage in this respect is obviously the implementation of time-outs where the kernel on behalf of a userland process waits a specific period of time for the arrival of an event – for example, 10 seconds for a user to press a key as a last opportunity to cancel before an important operation is carried out. Other usages are widespread in user applications.
The kernel itself also uses timers for various tasks, for example, when devices communicate with associated hardware, often using protocols with chronologically defined sequences. A large number of timers are used to specify wait timeouts in TCP implementation.
Depending on the job that needs to be performed, timers need to provide different characteristics, especially with respect to the maximal possible resolution. This chapter discusses the alternatives provided by the Linux kernel.
Performance and efficiency are two factors to which great importance is attached during kernel development. The kernel relies... more
Performance and efficiency are two factors to which great importance is attached during kernel development. The kernel relies not only on a sophisticated overall concept of interaction between its individual components, but also on an extensive framework of buffers and caches designed to boost system speed.
Buffering and caching make use of parts of system RAM to ensure that the most important and the most frequently used data of block devices can be manipulated not on the slow devices themselves but in main memory. RAM memory is also used to store the data read in from block devices so that the data can be subsequently accessed directly in fast RAM when it is needed again rather than fetching it from external devices.
Of course, this is done transparently so that the applications do not and cannot notice any difference as to from where the data originate.
Data are not written back after each change but after a specific interval whose length depends on a variety of factors such as the free RAM capacity, the frequency of usage of the data held in RAM, and so on. Individual write requests are bundled and collectively take less time to perform. Consequently, delaying write operations improves system performance as a whole.
However, caching has its downside and must be employed judiciously by the kernel:
Usually there is far less RAM capacity than block device capacity so that only carefully selected data may be cached.
The memory areas used for caching are not exclusively reserved for “normal” application data. This reduces the RAM capacity that is effectively available.
If the system crashes (owing to a power outage, e.g.), the caches may contain data that have not been written back to the underlying block device. Such data are irretrievably lost.
However, the advantages of caching outweigh the disadvantages to such an extent that caches are permanently integrated into the kernel structures.
Caching is a kind of “reverse” swapping or paging operation (the latter are discussed in Chapter 18). Whereas fast RAM is sacrificed for caching (so that there is no need for slow operations on block devices), RAM memory is replaced virtually with slow block devices to implement swapping. The kernel must therefore do its best to cater for both mechanisms to ensure that the advantages of the one method are not canceled out by the disadvantages of the other – no easy feat.
Previous chapters discussed some of the means provided by the kernel for caching specific structures. The slab cache is a memory-to-memory cache whose purpose is not to accelerate operations on slower devices but to make simpler and more effective use of existing resources. The dentry cache is also used to dispense with the need to access slow block devices but cannot be put to general use since it is specialized to handle a single data type.
The kernel features two general caching options for block devices:
The page cache is intended for all operations in units of a page – and takes into account the page size on the specific architecture. A prime example is the memory-mapping technique discussed in many chapters. As other types of file access are also implemented on the basis of this technique in the kernel, the page cache is responsible for most caching work for block devices.
The buffer cache operates with blocks. When I/O operations are performed, the access units used are the individual blocks of a device and not whole pages. Whereas the page size is the same with all filesystems, the block size varies depending on the particular filesystem or its settings. The buffer cache must therefore be able to handle blocks of different sizes. While buffers used to be the traditional method to perform I/O operations with block devices, they are nowadays in this area only supported for very small read operations where the advanced methods are too bulky. The standard data structure used for block transfers has become struct bio, which is discussed in Chapter 6. It is much more efficient to perform block transfers this way because it allows for merging subsequent blocks in a request together that speeds things up. Nevertheless, buffers are still the method of choice to represent I/O operations on individual blocks, even if the underlying I/O is performed with bios. Especially systems often have to read metadata blockwise, and buffers are much easier to handle for this task than other more powerful structures. All in all, buffers still have their own identity and are not around solely for compatibility reasons.
In many scenarios, page and buffer caches are used in combination. For example, a cached page is divided into various buffers during write operations so that the modified parts of the page can be more finely grained. This has advantages when the data are written back because only the modified part of the page and not the whole page need be transferred back to the underlying block device.
RAM memory and hard disk space are mutually interchangeable to a good extent. If a large amount of RAM is free, the kernel... more
RAM memory and hard disk space are mutually interchangeable to a good extent. If a large amount of RAM is free, the kernel uses part of it to buffer block device data. Conversely, disk space is used to swap data out from memory if too little RAM is available. Both have one thing in commondata are always manipulated in RAM before being written back (or flushed) to disk at some random time to make changes persistent. In this context, block storage devices are often referred to as RAM backing store.
Linux provides a variety of caching methods as discussed extensively in Chapter 16. However, what was not discussed in that chapter is how data are written back from cache. Again, the kernel provides several options that are grouped into two categories:
1. Background threads repeatedly check the state of system memory and write data back at periodic intervals.
2. Explicit flushing is performed when there are too many dirty pages in system caches and the kernel needs clean pages.
This chapter discusses these techniques.
The available RAM memory in a computer is never enough to meet user needs or to always satisfy memory-intensive applications... more
The available RAM memory in a computer is never enough to meet user needs or to always satisfy memory-intensive applications. The kernel therefore enables seldom-used parts of memory to be swapped out to block devices, effectively providing more main memory. This mechanism, which is referred to as swapping or paging, is implemented transparently by the kernel for application processes that automatically profit from it. Swapping, however, is not the only mechanism to evict pages from memory. If a seldom-used page is backed by a block device (e.g., memory mappings of files) then the modified pages need not be swapped out, but can be directly synchronized with the block device. The page frame can be reused, and if the data are required again, it can be reconstructed from the source. If a page is backed by a file but cannot be modified in memory (e.g., binary executable data), then it can be discarded if it is currently not required. All three techniques, together with the selection of policy for pages that experience little activity, go by the name of page reclaim. Notice that pages allocated forthe core kernel (i.e., not for caches) cannot be reclaimed because this would complicate things more than it would benefit them.
Page reclaim is the cornerstone to one of the kernel’s fundamental decisions with respect to caching. The size of caches is never fixed, and they can grow as necessary. The rationale behind this is simple: RAM that is not used for something is simply wasted, so it should always be used to cache something. If, however, some important task requires memory that is filled by the caches, the kernel can reclaim memory to support these needs. This Chapter describes how swapping and page reclaim are implemented.
Developers working on the kernel often have a natural interest to watch and inspect what is going on inside the code. But... more
Developers working on the kernel often have a natural interest to watch and inspect what is going on inside the code. But they are not the only ones who would like to know what the kernel does. System administrators, for instance, might want to observe which decisions the kernel has taken and which actions were performed. This can be beneficial for a number of reasons, ranging from increased security to postmortem forensic investigation of things that went wrong. It could, for instance, be very interesting to not only observe that a wrong security decision caused by some misconfiguration was made by the kernel, but also to know which process or users took advantage of this. This chapter describes the methods provided by the kernel for this purpose.
One of the key benefits of the kernel is the fact that it is mostly architecture-independent. Because the majority of the... more
One of the key benefits of the kernel is the fact that it is mostly architecture-independent. Because the majority of the sources are written in C, the implemented algorithms are not tied to a particular CPU or computer family but can, in principle, be ported to on any platform with modest effort assuming that a suitable C compiler is available. Inevitably, the kernel must provide interfaces to the underlying hardware, perform various system-specific tasks that involve countless details, and exploit the special functions of the processors used. These must generally be written in an assembly language. However, there are also some architecture specific data structures that are defined in C, so architecture-specific does not necessarily equate to assembler-specific. This appendix describes some hardware-specific aspects of important Linux ports.
Over the years, Linux has grown from a minor hacker project to a gigantic system that effortlessly competes with the largest... more
Over the years, Linux has grown from a minor hacker project to a gigantic system that effortlessly competes with the largest and most complex software systems. As a result, developers must deal with more than just the technical problems relating to how the kernel functions. The organization and structure of the sources are also key issues whose importance should be not underestimated. This appendix addresses the two most interesting questions in this context. How can the kernel be configured so that the corresponding parts of the source can be selected not only for a given architecture but also for a specific computer configuration? And how is the compilation process controlled? The second question is of particular importance when the kernel is repeatedly compiled for different configurations. Parts not involved in a configuration change obviously need not be recompiled, and this can save a great deal of time.
Everyone concerned with the kernel sources is impressed by their sheer size. Because the prime purpose of this book is to promote an understanding of the sources, this appendix examines various methods that are best suited to browsing and analyzing the source code. These include predominantly hypertext systems. This appendix also describes the options available to debug running kernels and to provide an insight into their structures both are useful aids to understanding. The appendix delves into User-Mode Linux (UML), a kernel port that runs as a user process on a Linux system and was incorporated into the official sources during the development of version 2:5. It also discusses the debugging facilities available to analyze a kernel running on a real system with all the benefits of modern debuggers, including single-stepping through assembler statements.
For more than 25 years, C has been the preferred programming language for implementing operating systems of all kinds including... more
For more than 25 years, C has been the preferred programming language for implementing operating systems of all kinds including Linux. The major part of the kernel with the exception of a few assembly language segments is programmed in C. Therefore, it is not possible to understand the kernel without a mastery of C. This book assumes that you have already gained sufficient experience with C in userspace programming. This appendix discusses little-used and very specific aspects of C in kernel programming.
The kernel sources are especially designed for compilation with the GNU C compiler. This compiler is available for many architectures (far more than are supported by the kernel) and also features numerous enhancements used by the kernel, as discussed in this appendix.
Like any other program, the kernel goes through a load and initialization phase before performing its normal tasks. Although... more
Like any other program, the kernel goes through a load and initialization phase before performing its normal tasks. Although this phase is not particularly interesting in the case of normal applications, the kernel as the central system layer has to address a number of specific problems. The boot phase is split into the following three parts:
Kernel loading into RAM and the creation of a minimal runtime environment.
Branching to the (platform-dependent) machine code of the kernel and system-specific initialization of the elementary system functions written in assembly language.
Branching to the (platform-independent) part of the initialization code written in C, and complete initialization of all subsystems with a subsequent switch to normal operation.
As usual, a boot loader is responsible for the first phase. Its tasks depend largely on what the particular architecture is required to do. Because in-depth knowledge of specific processor features and problems is needed to understand all details of the first phase, the architecture-specific reference manual is a good source of information. The second phase is also very hardware-dependent. Consequently, this appendix describes only some key areas of the IA-32 architecture.
In the third, system-independent phase, the kernel is already resident in memory and (on some architectures) the processor has switched from boot mode to execution mode in which the kernel then runs. On IA-32 machines, it is necessary to switch the processor from 8086 emulation, which is immediately active at boot time, to protected mode to make the system 32-bit capable. Setup work is also required on other architectures for instance, it is often necessary to activate paging explicitly, and central system components must be placed in a defined initial state so that work can begin. All these tasks must be coded in assembly language and therefore are not the most inviting parts of the kernel.
Concentrating on the third phase of startup allows for dispensing with many architecture-specific trifles and has the added advantage that, generally speaking, the remaining sequence of operations is independent of the particular platform on which the kernel runs.
ELF stands for Executable and Linkable Format. It is the file format used for executable files, object files, and libraries... more
ELF stands for Executable and Linkable Format. It is the file format used for executable files, object files, and libraries. It has long established itself as the standard format under Linux and has replaced the a.out format of the early years. The particular benefit of ELF is that the same file format can be used on practically all architectures supported by the kernel. This simplifies not only the creation of userspace tools, but also programming of the kernel itself for example, when it is necessary to generate load routines for executable files. However, the fact that the file format is the same does not mean that binary compatibility exists between the programs of different systemsbetween FreeBSD and Linux, for instance, both of which use ELF as their binary format. Although both organize the data in their files in the same way, there are still differences in the system call mechanism and in the semantics of the system calls. This is the reason why FreeBSD programs cannot run under Linux without an intermediate emulation layer (the reverse is naturally also true). Understandably, binary programs cannot be swapped between different architectures (for example, Linux binaries compiled for Alpha CPUs cannot execute on Sparc Linux), because the underlying architectures are totally different. However, thanks to ELF, the way in which information on programs and their components is coded in the binary file is the same in all cases.
Linux employs ELF not only for userspace applications and libraries, but also to build modules. The kernel itself is also generated in ELF.
ELF is an open format whose specification is freely available (also on the Web site associated with this book). This appendix is structured in the same way as the specification and summarizes information that is relevant.
This book has given you lots of information about concepts, algorithms, data structures, and code. Clearly these form the... more
This book has given you lots of information about concepts, algorithms, data structures, and code. Clearly these form the very core of Linux development, and that is what the kernel is all about. But there’s another side of Linux that should not pass by unnoticed: the community that develops the kernel, the way it works, and how people interact. This aspect is interesting because the kernel is one of the largest and most complex open source projects in existence, and it’s a role model for distributed, decentralized development on a gigantic scale. The purpose of this appendix is to provide an overview about numerous technical and social aspects of kernel development. Additionally, it talks about the relationship between the Linux kernel and academia.
Purchase Before purchasing this product, please be sure you have met all software and system requirements, and that you understand any limits placed upon its use.
Return Policy Wrox Chapters on Demand are non-returnable and non-refundable.
Reader Software Wrox Chapters on Demand are offered as PDFs, and they must be viewed using the Adobe Reader. If you do not have the Reader installed, it can be downloaded for free at Adobe.com.
Test Download As Wrox Chapters on Demand purchases are non-returnable, it is advisable that you test your system and software configurations with a free sample download before you place an order.
Usage Rights for a Wrox Chapter on Demand File Any Wrox Chapter on Demand product you purchase from this site will come with certain restrictions that allow Wiley to protect the copyrights of its products. After you purchase and download this title, you:
If you have any questions about these restrictions, you may contact Customer Care at (877) 762-2974 (8 a.m. - 5 p.m. EST, Monday - Friday). If you have any issues related to Technical Support, please contact us at 800-762-2974 (United States only) or 317-572-3994 (International) 8 a.m. - 8 p.m. EST, Monday - Friday).
Related Books