BenOS: An x86 Operating System

Last Updated on the 21st September, 2021

BenOS, if you haven't noticed already, is short for "Ben's Operating System". It's a bare-bones general-purpose OS catered towards the x86 CPU architecture. It features all the low-level capabilities of a modern OS, such as VGA output, keyboard input, interrupt routines, preemptive scheduling, paging, and an MMU-protected virtual heap and stack. For more information on these features, take a look at the tabbed section below, and for an even more detailed breakdown, check out my dissertation. If you are curious you can even download the source code and test out BenOS' features yourself. Both the soure code link and dissertation can be found in the links box below project information.


Emulation & Build Environment

QEMU was the selected emulation tool for this project as it provided swift and suitable emulation for testing minimal OS operations. The build chain began with a GCC cross-compiler for generic target i686-elf. A GNU MakeFile was developed to automate the build process by compiling objects, linking them and building a bootable .iso image via the grub-mkrescue program.

Project Information
Created:

Friday, 20th March 2020

Tools:
Languages:
  • C
  • Assembly
  • MakeFile

Take a more detailed look at BenOS' features!

To demonstrate BenOS' interupt handling capabilities, a software thrown interrupt can be used for testing __asm__ volatile(“int $0x08). Adding this instruction to the Kernel.c file after interrupts have been initialised will trigger a double fault exception. As you can see in the figure below, the fault is handled and useful register information is dumped to screen for debugging.

Now for a little background. On x86 architectures, there are 256 possible interrupts and the first 32 are system critical. Consequently, these need to be mapped to interrupt handler macros. Assembly functions can then be used to push the interrupt data onto the stack and hooked with a corresponding ISR assembly handler. This will save the processor state before calling a higher-level C handler function. Once the interrupt has been handled, the stack frame; which holds the register structure shown below; can be restored.

x86 operating system register state struct

To test paging on BenOS, a variable with a pointer can be assigned to an illegal address space, for instance 0xAAAAAAAA that has no page assigned. Now, when the variable pointing to that address space is used, it will trigger a page fault exception. BenOS then outputs the caught data to terminal and stops system execution to prevent further harm.

One of the principal features of an operating system is to provide a way to dynamically allocate and deallocate memory. This includes providing a way to reference virtual memory space that can be translated to a physical one. Paging allows physical address space to be non-contiguous while additionally avoiding external fragmentation. Implementing paging requires physical memory to be divided into fixed-sized contiguous chunks called frames and virtual memory to be divided into fixed-sized non-contiguous chunks called pages. This requires the use of three fundamental structures: a page directory, a page table, and a page, shown in the diagram below.


BenOS x86 paging model
Logical address translation of a 4KB page using a paging model of logical and physical memory.

Physical to Virtual Address Translation

To enable paging on the x86 architecture, the physical address of a page directory must be loaded into the CR3 register. Before enabling, it is important to identity map physical addresses with virtual addresses so that when paging is enabled there are pages that can be accessed without triggering a page fault. The example code below loads the CR3 register in higher-level C before calling an assembly function to set the PG bit in the CR0 register.


Bitmap Physical Page Frame Allocation

A constructive way to allocate frames is to use a large bitmap that identifies which frames in memory are being used. It's an effective and compact storage technique for holding arbitrary bits. Better than holding an array of 1's and 0's at least. Additionally, bitmaps have the ability to exploit bit-level parallelism and limit memory access. On the downside, accessing individual bits is more expensive and complicated depending on the search algorithm used. There are plenty of other physical memory allocator algorithms that could be implemented, however, this bitmap technique is effective for BenOS' level of implementation.

BenOS x86 bitmap

The heap data structure helps to keep track of memory. It holds static data and a large free region known as the hole, which can dynamically grow and shrink. When the heap's available space is used, the OS can request to increase the heap size by moving the heap boundary up, closer to the stack. The request to increase size comes from the malloc() function and to decrease the size and free up memory we can use the free() function.

This variable-partition technique involves maintaining a table to indicate which parts of memory are free or occupied. The algorithm explained below uses this technique and is similar to Doug Lea’s memory allocator algorithm used in the GNU C library. It is relatively simple to implement, which makes it a great choice for the implementation of a small operating system like BenOS. To set the picture, the framework of this algorithm must first be explained. Firstly, blocks are contiguous areas of memory that contain data. Holes, on the other hand, are like blocks but their memory is not currently in use. As you can see in the ordered heap table diagram, for every hole there is a related descriptor in an index table ordered by hole size. Each of these holes has its own header and footer that hold metadata. The header contains information about the hole and the footer contains a pointer back to the header; allowing the unification of holes during the freeing process. For error checking, both the header and footer have a magic number field in order for them to stand out and be easily located.

BenOS x86 heap algorithm diagram
Ordered heap table pointing to memory holes and blocks.

BenOS malloc function
BenOS' Memory Allocate Virtual Function

BenOS' Heap Implementation

The heap structure itself contains information about size, role, read/write access, and a table for managing its blocks and holes. The meta_header and meta_footer are used to locate the head and foot of a block or hole; depending on the bit set for the header structs free attribute. The heap table is ordered by comparing the values of the hole or block headers size and arranging them in ascending order. It is insertion sorted and remains in a sorted state between calls. The unknown_t type is used to store anything that can be cast to void*, like a uint32_t or any pointer. The ordering behaviour is encapsulated within the order_tfunction pointer to compare the size of holes or blocks:

Because we are using paging to allocate space for the heap, pages must be allocated before initialisation. Once the pages have been mapped to their physical heap address frames, the interrupt handler for page faults can be registered and the page directory loaded. After all this, the heap can finally be initialised with paging enabled and no page faults exceptions should be thrown. Before the heap is initialised, memory can still be allocated using memory allocation by placement address. This is optimal for time and space allocations but does not combat the issue of deallocation. This implementation uses two variations of the well known C library malloc() function: malloc_virt();to allocate virtual memory, and malloc_phys();to (as you guessed) allocate physical memory. In malloc_virt() the size requested is simply allocated in the kernel heap and then an address pointer is returned. The malloc_phys() function, on the other hand, will assign a page to a page frame using a parsed physical address parameter, before returning the address pointer. All these priniciples apply to the virtual memory allocator function for BenOS shown above .


Growing & Shrinking the Heap

The allocate() function for the heap works by firstly, searching the heap table to find the smallest hole that can fit the size requested. If there is no hole large enough, then the heap is instructed to expand; bounded by a max size, before recursing and trying to find a hole again. If the smallest hole has a lot of free space, then it is split in half, and a new hole is written to the heap table. At the end of all this resizing and hole formatting, the address plus the size of the meta_header is returned to the user. The deallocate() function, is a little more complicated. It involves a unifying algorithm that takes two unallocated adjacent holes and fuses them back into one larger hole. When deallocating a block, look at what is left of the block, if it’s a footer, then follow the footer pointer to the header. If the headers free attribute is not set then the size attribute can be modified by adding the block being merged. The headers pointer can then be changed to point to the previous blocks footer instead. Unifying from the right is slightly different but follows the same principle.

One of the final hurdles of creating a simplistic modern OS is providing it with the ability to run multiple tasks seemingly at once. For interactive operating systems, users need a responsive environment. algorithms can be forked into two divisions with respect to how they handle timer interrupts; preemptive and non-preemptive. A non-preemptive scheduler picks a process,and that process holds the CPU until it voluntarily releases. On the contrary, preemptive scheduling picks a process and allows it to hold the CPU for a set interval. If the process is still executing after that interval has transpired, the process is suspended, and another is picked for execution. Pre-emption is key to stopping process from monopolizing the CPU and denying its services to others. Moreover, it can prevent processes from blocking others out indefinitely, either from continuously running or through program errors.

BenOS x86 scheduler
Pre-emptive scheduler controlling the task execution sequence.

How to Schedule Processes

When a process is interrupted, an interrupt service procedure starts by saving the register contents. The data is pushed onto the stack by the interrupt and stored; the stack pointer is then changed to point to a processes temporary stack. This must be written in an assembly language procedure as high-level languages like C are incapable of such actions. A routine is then called in C to handle the interrupt type. Afterwards, another assembly procedure should start up the new process by loading the registers and memory map, with paging enabled, CR3 should be loaded with the page directory. Having the process queued in a round-robin is easy to implement, and starvation-free; in that, it will not perpetually deny processes from accessing the CPU. A simple yet dynamic function can be implemented to add tasks to the end of a queue or list. The list can be iterated over and modified during the iteration, to point each task to the next, every time a new task is added. In the end, the tail can be pointed back to the head of the queue to recurse.


BenOS' Multitasking Implementation

As previous mentioned, a C task struct is needed to hold all of a processes attributes. In BenOS' case, the reg_state *r pointer should contain the saved state of the processor before a task interrupt; because paging is enabled, the physical address of whatever directory the task is in should be saved in CR3 . To implement the round-robin scheduling policy later, the task must contain a pointer to the next task. I use a function pointer to simply pass an operation, which in this case is to display a stored task string message.

Conveniently, there already exists a mechanism to retrieve the register state when a task needs to switch. The common C interrupt handler created for interrupt handling, already pushes the required contents onto the stack. Therefore, when an IRQ0 is triggered by the hardware timer, the task state can be saved and switched. With a frequency of 100hz set, an interrupt will occur every 0.01 second, so when the remainder of the number of ticks divided by the set timeslice constant is equal to 0, a task switch is triggered. This means that if the timeslice is set to 100, a task switch will be triggered every second.

BenOS x86 structures

BenOS x86 task scheduler switch
Pre-emptive scheduler controlling the task execution sequence.

The switch_tasks() function can take the saved register structure as an argument when called by the IRQ0 handler. It operates by firstly, copying the saved register state from the interrupt event to the task structs reg_state *r, using the copy_memory() function. It will then check if the current task is set and if so, sets it to execute and sets the current task equal to the next queued task; for when the function recurs. After this, it copies the current processes register state back to the stack and switches the page directory to the directory stored in the current task, demonstrated in the switch_tasks() code.

BenOS x86 scheduling example

As a demonstration of the BenOS' ability to switch between processes, a large time slice; two seconds, was set in order to visualise the process switching clearly. The process id and starting address of the tasks dedicated address spaces are outputted to the terminal for context. Each task is allocated a 0x1000 sized chunk of address space and is placed in a dynamic round-robin queue for execution.