Unit Four Objectives, CSCI 423, Introduction to Operating Systems Tobin Maginnis Updated: 8-Nov-07 The Linux Kernel Primer by Rodriguez, et. al. Chapter 4 - Memory Management 1)Describe the implementation and operation of a paged MMU. a)The Memory Management Unit resides in the CPU and translates logical or virtual addresses to physical page-relative addresses. b)The MMU can be visualized as a list of physical block numbers whose index, or offset into the list, represents the logical address of that physical block. c)Some computer architectures employ two or more page tables which form a linked list in translating virtual to physical addresses. d)For each page table employed, the virtual address has two sub-sections indicating the i)logical page number and the ii)page offset in bytes. e)The smaller the page size (e.g., 512 bytes) the larger and more unwieldy MMU processing becomes. The larger the page size, the larger last page fragmentation becomes. On average, half of the last page will never be allocated. 2)Describe the performance issues with paged MMUs. a)Each process executes inside its own memory context. Thus, the MMU contents are saved and restored for each process context switch. b)If each process used all of its three GBs of user address space (and assuming Linux's 4096 byte page size), then each context switch must save/restore more than seven hundred thousand page table entries. (3,000,000,000/4096 = 732,421) c)Due to their large size, page table entries reside in main memory. And the CPU must access memory twice to fetch the contents of a variable; once to get its physical page address and again to get its contents. d)Modern CPUs speedup memory access by holding the last 128 or 256 page table lookups in a Translation Lookaside Buffer (TLB) located in the CPU. Also, if the page table happens to be in the cache, then the first memory access is eliminated. e)In spite of paged MMU designs being slow compared to indexed segmented MMU designs, the simplicity of programming in a large, sparse, address space has made paged MMUs the most popular choice among system programmers. 3)Explain the need for memory zones and describe the three zones employed in Intel X86 architecture family. a)As computer architectures evolve, their ability to access memory improves, so zones are really windows into the various ways the CPU may access memory. b)ZONE_DMA - 1st 16MiB (2^24) of memory which could be accessed by the original motherboard DMA circuit. ISA device drivers are restricted to transferring data to/from this area. So called "bounce buffers" or staging buffers are used move data to/from higher memory areas. c)ZONE_NORMAL - 16MiB to 4GiB (2^32) memory area that can be accessed with a virtual address. PCI device controllers can access this memory area with their on-board DMA circuits. d)ZONE_HIGHMEMORY - Memory above above the 4GiB (2^32) boundary. Intel architectures have an older segmentation MMU in addition to the paging MMU. The paging MMU is subservient to the segmentation MMU. This means that all pages for given table(s) must reside within a defined segment. Typically, just one 4GiB segment is declared and the paging MMU has unfettered access to physical memory. However, the segmentation MMU has a physical address of (2^48?). Thus, the kernel creates additional high memory bounce buffers, but instead of moving data, the values of segment registers for the kernel are modified so that its high pages access a window of memory above the 4GiB limit. 4)Contrast page (buddy) versus cache (slab) memory allocation and give the design tradeoff. a)Linux employs a two-level memory allocation scheme. At the gross level, (the buddy algorithm) one or more 4096 byte contiguous pages are allocated based upon a binary search of free physical memory. That is, free contiguous memory chunks are searched from largest to smallest until the minimal size chunk is located. b)At a more fine-grain level, (less than a page of memory) the cache or slab allocator sub-divides a page into the required object size. Generally, one slab is allocated for a cluster of similar objects (records) which begin at less than a page of memory, but may grow to include more pages. The slab allocator obtains its non-contiguous pages from the buddy algorithm. c)The two-level memory allocation system limits fragmentation when the buddy algorithm searches from the largest to smallest contiguous free memory areas. But the caching slab allocator allows for sub-page size allocations yet limiting fragmentation to an average 1/2 page for similar data types. 5)Describe the role of the task_struct->mm, mm_struct->mmap, and vm_area_struct->vm_mm. a)These three structures describe chunks of contiguous memory that has been allocated to a process. b)The task_struct field "mm" points to the linked list head mm_struct. The mm_struct field "pgd" points to the saved MMU page table for this process and mm_struct also forms a doubly linked list allowing the kernel to search all allocated memory independent of PIDs. c)The mm_struct field "mmap" points to the start of another linked linked list, vm_area_struct, where each list element describes a contiguous memory area such as code, data, heap, & stack of the process. d)Since the vm_area_struct list describe the process' logical (virtual) view of memory, more vm_area_struct list elements are added to point to the code, data, & heap of the ld-linux ELF driver, as well as, the code, data, & heap of each shared library. 6)Describe three ways that Linux automatically manages data movement among memory hierarchy components. a)About 99% of files are access sequentially, thus Unix file managers historically have a built-in read-ahead service where the kernel prefetches the next sequential file block with the expectation that it will be needed in the future. b)Unix OSs have always employed a "data cache" which holds blocks recently accessed from I/O devices. Each block in the data cache has an associated header structure which forms an element in a cloud of multiple linked lists. Multiple paths through the cloud allows the kernel to search the data cache by per process, per device, per usage, or exhaustively. Hash tables are also used to eliminate sequential searching except, of course, for collisions. c)Using the data cache as a service, Linux abstracts an additional "file cache" which holds the most recent working set of files accessed by an application. Thus, if the user cycles among an editor, compiler, temporary files, and output files, then all of these files will be held in the file cache. d)The size of these optional caches are dynamic and depend upon the amount of unused memory. After boot-up, most of main memory is unallocated. When user processes begin to access I/O devices, and subsequently whole files, the two caches grow to fill all of available memory. However, user processes have higher priority, and as they increase in size and number, the file cache shrinks, then the data cache shrinks to its low-water mark, but not any further. e)These memory management features have been introduced into Windows Vista and referred to as "SuperFetch." 7)Describe the implementation of page faults. a)Even though a paged MMU allows a large address space, not all logical addresses are mapped to physical addresses since there is usually less physical than logical memory and there may be many processes requesting access to more logical memory than there is physical memory. b)As more and more physical memory is allocated to processes, the Memory Manager searches for pages that have not been recently accessed. These old, stale, least recently used pages are moved out to the swapping disk, the corresponding page table entry is cleared, and the page-not-present bit is set in the page descriptor for that page table entry. c)When a process accesses a logical address whose associated page table entry is not valid, or whose page-not-present bit is set, then the MMU asserts a page table exception trap and the Memory Manager is entered as a slow, page-fault ISR. The page-fault ISR suspends the current process and inspects the offending memory reference. d)If the process made an illegal memory reference (page table entry not valid) from its code space, then the process is terminated by the Memory Manager with the message: "Segmentation fault - Core dumped." e)If the process accessed a swapped page (page-not-present), then another page is selected for swap out, the old page is swapped back to main memory, and the process is restarted. f)If the process made an illegal memory reference while pushing an item on its stack, then the page fault ISR instructs the Memory Manager to add another page to the process' stack size and restart the process. g)Some times CPU register values change while attempting to access illegal memory. For example, pushing a variable onto an illegal stack address would leave the SP decremented. If the process is to be restarted, then the last executed instruction is examined by the MMU hardware. If a register did change value, then the old value is restored before the process is restarted. In our example, the SP would have been incremented before restarting the program. The Linux Kernel Primer by Rodriguez, et. al. Chapter 5 - Input/Output 1)Describe the traditional Unix LUTs, explain how they are mapped onto the file system, and the subsequent limitation. Also explain how your authors see "devfs" as an improvement on the old design. a)The Logical Unit Table is a jump table where rows contain addresses of public entry points into device drivers. Columns of the LUT represents a common function among all device drivers such as open, close, read, write, rewind, ioctl, validate, media check, and so on. These columns define a common set of commands for device drivers. In this fashion, if a given driver does not support a command type, it simply returns to caller or possibly returns an error code rather than hang up the OS. b)The file manager recognizes the mode field in the inode as special file types, it uses the major number (row number) from the inode to access the LUT type "c" or "b" for sequential and random access drivers. c)As the number of devices and drivers have grown combined with the ability to dynamically load device drivers at run time, the eight bit major & minor numbers are becoming scarce and jumbled. Furthermore, Unix requires an unique "pseudo tty" for each virtual terminal network connection. d)Thus, the devfs algorithm dynamically creates device inodes and directory entries for each loadable device driver and virtual terminal network connection. 2)Describe how "udev" attempts to solve the scarce-LUT-number and hotplug problems. a)Since Linux version 2.6, the devfs algorithm was dropped in favor of a user-level background program called "udevd" or User Device Daemon and device numbers were expanded to 16 bits. b)When media changes (e.g., eject the CD) or when media is removed (e.g., unplug the USB memory stick) the device driver sends a signal which udevd catches. c)In response to the signal, udevd may invalidate any cached data for that device, re-validate and cache the media contents, remove or create the /dev/xxx entry for that device, or mount or unmount a default file name for that device in the file system hierarchy (e.g., /media/usbdisk/). Omit)Define PMS notation and give the four roles in a typical bus protocol. Omit)Draw a PMS diagram of a minimal computer system, contrast Arbitrator, Master, Slave switch protocols (bus cycles) and give the four steps of the Interrupt bus protocol. Omit)Contrast "device," "controller," and "host interface" and explain how they relate to the device driver. 4)Using PMS notation, contrast PATA, ATAPI, SCSI, SATA, and ATA hybrid. a)Parallel Advanced Technology Attachment (PATA), in spite of it name, is a simplistic host bus adapter that relays bus signals to disk drives. One conductor on the I/O cable is used to select one of two drives. PATA controllers are located on each drive. b)With the advent of more complex removable media devices such as CD-ROM and Zip drives, version four of ATA protocol introduced the AT Attachment Packet Interface. Instead of asserting commands with I/O bus wires, the data itself (a packet) described the status/action of the drive. (Thus, requiring more firmware in each device and a more sophisticated device driver.) c)Like PATA, the Small Computer Systems Interface (SCSI) has been around since the 1980s and it was designed to be a high performance (and expensive) alternative to PATA. SCSI employs a three-level controller hierarchy with the first on the host adapter, the second on the I/O bus and the third (LUN) controller on the drive. d)With version seven of the ATA protocol, a serial interface was created. Since serial interfaces are point-to-point, only one SATA drive can be connected to each SATA interface. e)Version eight of the ATA protocol introduced large amounts of flash memory on the disk drive so that time sensitive files can be read 130 times faster than reading the file off the disk itself. Microsoft Windows Vista will call this hybrid "ReadyDrive." http://en.wikipedia.org/wiki/AT_Attachment http://www.ata-atapi.com/ http://en.wikipedia.org/wiki/Scsi http://en.wikipedia.org/wiki/Serial_ATA http://www.serialata.org http://www.pclaunches.com/hard_drive/seagate_momentus_5400_psd_laptop_hybrid_drives_announced.php 5)Explain the debate around I/O bus speeds and measured transfer speeds. a)Manufacturers sell I/O controllers based upon theoretical transfer speeds such as PATA UltraDMA 133 MB/s, SATA 150 MB/s, and SCSI 400MB/s. b)But disk transfer speed is limited by how long it takes the read/ write head to reach the correct track number, the time it takes the desired disk block to come around to the read/write head, and how quickly the disk is spinning. c)Typical disks read around 15MB/s, faster drives may read out at 26MB/s, while really fast drives may reach 50MB/s. Some high end disk drives may read as fast as 80MB/s d)Thus, any of the above I/O hard disk controllers can support the fastest disk drives and one should probably select a drive based upon reliability and cost before selecting one based upon theoretical transfer speed. http://en.wikipedia.org/wiki/AT_Attachment 6)Contrast a traditional bus implementation versus serial line implementation and give the tradeoff. a)For each additional conductor in a parallel bus, the bus can increase its bits/second bandwidth by a power of two. Assuming the same data rate among parallel buses, a 16-bit bus runs twice as fast as an eight-bit bus. A 32-bit runs four times as fast and so on. b)But long parallel buses act like antennas and need to suppress induced voltage (EMF) artifacts. One way is to use a voltage divider circuit to hold the bus lines at a given voltage. Another way is to assert a signal by pulling (shorting) the voltage to ground. In this way, an induced voltage would be treated as no-signal. c)Regardless of how well the strobe signal commands data onto the bus, there will always be some bits that appear before others (skew). Bus speed is slowed while the receiver waits a de-skew interval before latching the bus data into its local buffer. d)With ICs embedding packet protocols in hardware, serial lines have the ability to communicate at very high (GHz) speeds over short distances (1-2 feet). The packet protocol is required to multiplex signals while twisted pairs and shielding eliminate induced voltage (EMF) problems. 7)Contrast Infiniband and PCI-Express (PCIe). a)These are two 2.5GHz Baud serial bus protocols with 8B/10B encoding or 20% error detection and correction overhead leaving a data rate of 2 GB/s. b)PCIe is usually run over copper wire (< 2 feet) while Infiniband usually employs fiber optic cables in the data center (10-100 feet). c)PCIe X4 and PCIe X16 employ multi-lane serial lines and interleave among the lanes. Also, PCIe will move packets among lanes to prevent repeating patterns of 1s & 0s from causing EMF artifacts and subsequent cross-talk on the multiple serial lines. d)Unlike PCIe, Infiniband has header information in the packet to allow routing which slightly reduces its data rate compared to PCIe. e)PCIe is a hot-pluggable device protocol designed for external media. http://en.wikipedia.org/wiki/Infiniband http://en.wikipedia.org/wiki/PCI_Express 8)Describe the FCFS (torrid disk dance) problem and how SSF helps solve it. a)The longest aspect of disk access is the time it takes to move the R/W head track to track (cylinder to cylinder). And given that a fast CPU is 20 thousand times faster than a typical hard disk, a series of disk block requests will always accumulate as the drive plods along from block to block as it reads and writes data. b)Disks blocks can be read in a First Come, First Served (FCFS) fashion, but if a request for a block on track one is followed by a request for a block on track 10,000 follow by a request for a block on track 2, then twice is much time is wasted moving the R/W head out to track 10,000 and back to track 2 instead of just picking up blocks on tracks 1,2, and then 10,000. c)Shortest Seek-time First (SSF) corrects this problem by track requests in ascending order to minimize overall R/W head movement. 9)Explain the limitation of SSF, explain how the elevator algorithm solves this limitation. a)On average, a disk R/W head will tend to be more towards the middle of the platter simply because it can move in either direction. If the R/W head was at the extreme edge of the platter, it could only move towards the center. Thus, Shortest Seek-time First (SSF) tends to favor mid-range block numbers (where the R/W head happens to be) and delays access to extreme low or high block numbers. b)If, like an elevator, SSF must go all the way from one extreme to the other before re-ordering block requests, then a compromise between minimal R/W head movement and access time to individual disk blocks would be achieved. 10)Describe Linux I/O hard drive scheduling, contrast No-op, Deadline, AS, and CFQ scheduling. Explain the effects of single user PCs and the new generation of flash/rotating memory hybrid hard disks. a)Linux allows the system administrator to select from a set of elevator algorithms. The selected algorithm applies to all disk controllers and attached all disks on the system. b)The No-op scheduler implements the basic elevator algorithm. It is assumed that if more optimization is required, an intelligent disk controller will perform those operations. c)The Deadline elevator algorithm minimizes I/O latency for a given I/O request. Deadline uses five I/O queues and a round robin policy among queues to be fair to all and avoid process starvation, but Deadline will re-order requests to improve I/O performance for one process while delaying other processes. d)The Anticipatory Scheduling (AS) elevator delays (plugs) incoming requests before dispatching the I/O (unplugging) in an attempt to aggregate and re-order the largest possible number of sequential requests thereby reducing total number of disk seeks. The AS scheduler may cause higher I/O latency for the plugged processes, but it improves overall system throughput and is designed for desk top or single-user computer systems. AS was the default Linux elevator until kernel version 2.6.18. e)Complete Fair Queuing (CFQ) places requests into per-process queues and allocates timeslices for each queue. Higher I/O priority processes requests are placed in queues with longer time slices and fewer total requests. CFQ seems to work better than AS in data centers with I/O bound processes. CFQ is the default Linux elevator after 2.6.18. http://en.wikipedia.org/wiki/CFQ http://www.redhat.com/magazine/008jun05/features/schedulers/ 11)Describe the programmer's view of the four generic disk controller registers and describe the implementation of DMA. a)I/O devices are accessed via read/write set of flip-flops called registers. Bits in the registers either direct the behavior of the device or show its status. b)I/O registers may appear as a block of memory locations in a reserved memory address range (memory mapped I/O page) or the registers may be accessed via reserved I/O instructions which require a I/O bus address and the data to be read/written to the register. c)Disk drives have four basic registers: status, disk, and data buffer address registers, and a disk block count register. d)The device driver begins a Direct Memory Access (DMA) transfer by setting disk and data address registers as well as a count of the bytes to be transferred in the count register. e)The status register also has command (R/W) bits and a "go" bit that is used start the disk controller. The disk controller then waits for the drive I/O bus to become available, transfers data, and decrements the count register by the number of bytes transferred. f)When the count reaches zero, the disk controller asserts an interrupt to indicated the transfer is complete. Lecture - I/O devices 1)Describe the role, topology, and data rates of USB. a)The Universal Serial Bus was designed to unify all serial and parallel device drivers into one class of drivers. b)Each USB port on the personal computer acts as a host controller that attaches to USB devices or USB hubs. Devices may have USB-out ports to allow daisy chains. Hubs allow a tree structured branching topology, but branching is limited to five levels per controller. Using daisy chains and hubs, up to 127 USB devices may be logically connected to one controller. c)Each USB device description is based upon a hierarchy that has configuration descriptor at the root, followed by interface descriptors, which hold interface settings, and which in turn hold the USB device endpoints. d)Interface descriptors are sub-divided into classes which define the behavior of a device so that the same device driver may be used for any device which is a member of the class. If an operating system implements generic drivers for all defined device classes it should be able to handle most any USB device. e)USB controller and devices exchange data packets including language, manufacture, and product specific strings. Moreover, the controller provides 0.5 Amps at 5V to power USB Devices. Because of the limited power, only a handful of bus-powered USB devices may be connected to one controller. http://en.wikipedia.org/wiki/Usb 2)Contrast USB and Firewire serial protocols. a)Firewire (IEEE1394) predates USB (1993 vs 1995) and was created by Apple Computer as a high speed serial interconnect. b)Apple promoted Firewire as a IEEE standard, but also demanded a $1 per port royalty payment. In response, Intel created USB 2.0 and the two standards have been rivals since. c)USB 2.0 (480Mb/s or 60 MB/s) is faster than Firewire (400Mb/s or 50MB/s) but real world measurements of typical PC configurations show transfer speeds of both protocols to be about 50% of theoretical speeds (~25-30 MB/s). d)Firewire is loosing market share to USB, but still used among its first adaptors, visual arts professions. http://en.wikipedia.org/wiki/Firewire 3)Describe the theory of operation of flash memory. a)Flash memory is made from Electrically Erasable Programmable Read Only Memory or EEPROM. b)EEPROMs bits are fixed physically through hot-electron injection of a per bit MOSFET floating gate in the integrated circuit. Once an area or block of EEPROM is reset, it can be written to byte by byte. c)Re-writing to an EEPROM requires that an area or block be read into an off chip buffer and the block erased. Next the new data is merged with the saved buffer and the whole block is written back to EEPROM. d)Unfortunately, when performing repetitive writes to the same EEPROM block the erasing processes takes longer and longer until it fails. But, most EEPROMs are designed to perform 100s of thousands to a million erase cycles. e)This read/erase/write sequence takes about 70 micro seconds and, therefore EEPROM write latency is slow compared to CPU speeds. But when compared to typical hard disks, the write latency is about 130 times shorter (faster access). http://en.wikipedia.org/wiki/Flash_memory 4)Describe the USB drive and its design tradeoffs. a)Application Specific Integrated Circuits (ASICs) combine Electrically Erasable Programmable Read Only Memory (EEPROM) and Universal Serial Bus (USB) protocol circuits into very small packages which are also powered by the USB. b)Because of the mass production needs of the iPod, EEPROMs prices dropped significantly after 2005-2006 and we are witnessing a cost/benefit shift in the memory hierarchy. Now for 10s of dollars, the OS and its utilities can be stored on a GiB device that is roughly 130 times faster than a hard disk! For a lot more dollars, there are now 64 GiB USB drives in development. c)Even though the ASIC must buffer and reformat USB serial data so that 128 bit blocks can be written per cycle, the USB 2.0 transfer speed (~30 MB/s) exceed maximum EEPROM write speed. d)Thus, USB drives have the ability to offer 3-10 second boot up times for an OS. Also, there are "live USB disks" which allows one to move the entire OS from computer to computer. f)Bad News: Even though individual bytes may be read from an EEPROM, write cycles operate on blocks of data. Thus, EEPROM drivers buffer individual bytes as much as possible before a write cycle and frequently written blocks are noted and moved to spread write-cycle wear and tear equally among all blocks. g)Good News: If used as a read-only device the USB drive provides higher performance and lasts longer. If used as a read/write device, the USB drive ASIC firmware contains the wear-leveling algorithm. Also, the ASIC firmware presents the old Microsoft File Allocation Table (FAT) format to the USB device driver. Every OS has a "FAT backend" for its file manager. Thus, if a USB drive is loaded with an operating system kernel and many of its read only binary programs such as the shell, libc, /bin and /sbin utilities, then USB drives should offer significant OS speed up for the life of the computer (five or so years). h)Microsoft Windows Vista calls this ReadyBoost. http://www.simonf.com/usb/ http://en.wikipedia.org/wiki/Asic http://en.wikipedia.org/wiki/USB_flash_drive http://www.pclaunches.com/hard_drive/seagate_momentus_5400_psd_laptop_hybrid_drives_announced.php 5)Explain the difference between rated transfer (write) speeds for an USB flash drive and observed performance. a)Unlike a traditional random access device, individual data blocks cannot be accessed on a USB flash drive. Rather, they only accept file names and file data. b)The on-board USB ASIC uses the old Microsoft file allocation table (FAT) file manager mapping method which employs a sequential list of block numbers where the list index represents the block number of the currently accessed file and the list element is the index number of the next block number in the file. A list element of -1 indicates the last block of the file. c)USB transfer speeds are quoted in the range of 27-32MB/s and when transferring large files, transfer rates approach the quoted values. However, when transferring thousands of small files transfer rates can drop to 15KB/s or 15 minutes to copy 10MB of small files to a USB stick. d)The reason is, for each of the thousands of files copied, the FAT must be searched for an unused (zero) entry. Then the directory must be updated with the FAT index and file name. If there are sub-directories, then more directory lookups and FAT traversals are required. e)All of this FAT and directory activity create a hot spot that has to be remapped from logical volume offsets to physical blocks for load balancing to spread flash writes more evenly across the whole memory range. f)So, the difference observed in write speed (20 MB/s versus 15KB/s) is the FAT and directory gymnastics time. Even though a flash data block can be written in 70 microseconds, the writes add up as the memory stick has to execute back-end file manager code. g)A compromise is to gather many small files into a large archive before the transfer to the USB drive. http://en.wikipedia.org/wiki/File_Allocation_Table 6)Describe the general steps required to boot Linux. a)In a Unix box, graphics are done in user space and, therefore, to show graphics at boot time the kernel must have access to a file system and the X Window System. b)There is also a Linux utility, initialize RAM disk or initrd, which will create a compressed image of a minimal file system. c)The PC BIOS reads the boot disk master boot record from the 1st 512 byte sector of the hard disk (e.g., /dev/hda) which loads the boot manager (GRUB) from the 1st 512 byte sector of the hard disk partition (e.g., /dev/hda1). d)GRUB reads its second image so that it can now access the file system, them reads the initrd image from the /boot directory, which creates a minimal file system in memory so that the X Window System can be used to display graphic bitmaps during the remaining kernel boot process. e)After the kernel is read from the hard disk, and upon completion of the boot process, the kernel switches to the hard disk file system and reads the /etc/inittab file. It contains the desired run level and points to selected init scripts (which run the various network and background daemons) and setup getty so users can login. Omit)Describe the steps required to configure a Linux USB boot drive. a)Insure the computer's BIOS supports booting from a USB drive (recent BIOS' have that functionality). b)Install Linux on the USB drive. c)Build a new initrd kernel image with options like: INITRD_MODULES="ehci-hcd ohci-hcd uhci-hcd usb-storage sd_mod" mount /dev/usbdisk /boot mkinitrd d)Configure the boot manager (GRUB) with a USB boot option or make it the default boot image with: kernel (usbdisk,0)/boot/vmlinuz root=/dev/usbdisk initrd (usbdisk,0)/boot/initrd e)Setup the hard disk partitions with /tmp, /var, /home, and any other directory that is frequently written. f)Modify /etc/fstab so that the hard disk directories are mounted over the USB drive directories. /dev/usbdisk / fat 0 0 /dev/hda1 /tmp ext3 acl,user_xattr 0 0 /dev/hda5 /var ext3 acl,user_xattr 0 0 /dev/hda6 /home ext3 acl,user_xattr 0 0 /dev/hda7 swap swap defaults 0 0 proc /proc proc defaults 0 0 sysfs /sys sysfs noauto 0 0 debugfs /sys/kernel/debug debugfs noauto 0 0 usbfs /proc/bus/usb usbfs noauto 0 0 devpts /dev/pts devpts mode=0620,gid=5 0 0 /dev/fd0 /media/floppy auto noauto,user,sync 0 0 g)Duct tape the USB drive to the back of the computer or hide it inside the case. Omit)Define "interleaving" versus "skew" and explain the role of these concepts. Omit)Contrast a sequential file system, a fixed size index (FAT-like) file system, and a fully-indexed file system. Give the design tradeoffs. 9)Describe the inefficiency of a constant angular velocity (CAV) disks, explain how "bit-zone" recordings improves recording efficency, and explain how this impacts data density and I/O speed. a)In a CAV drive, disk speed is constant, but disk diameter changes. Thus, the inside track next to the hub (track N) has the highest density and the outside track (track 1) has the lowest bit density. b)Even though bit density decreases towards the outer part of the platter, the outer edge spins faster than the inner edge and therefore the bit rate under the R/W head is constant regardless of track position. c)In new disk designs the limiting factor is not bit rate, but bit density. Thus, if bit density was held constant, then more bits would fit on a track the farther out the track was from the center. The outside track (track 1) would hold the most bits. d)A track is divided into physical records call a sector (usually 512 bytes plus metadata). Bit-zones divide the platter into areas increasing sectors per track where the inner tracks have the fewest sectors and the outer tracks have the most sectors. e)Thus, bit-zone designs create a disk that may transfer data twice or three times faster to/from the outer edge (track 1 or partition 0) than it can to/from the inner edge (track N or partition 4). f)To exploit this hidden feature of disk drives, create two or more partitions and configure the most used (root) partition at the front of the drive and the least used partition (swap) at the end of the drive. 10)Describe the rationale and explain how Linux utilities fdisk, parted, hdparm, mkfs, dumpe2fs, tune2fs, and mount are used to fully exploit a bit-zone hard disk. a)fdisk - Add/delete modify disk partitions. b)parted - Re-sequence (de-frag) files and resize partitions. c)hdparm - Configure disk controller and test I/O bandwidth of disk. d)mkfs - Create the file system map (ext2, ext3, vfat, etc). d)dumpe2fs - View relative placement of files within file system. e)tune2fs - Move files relative to start-of-volume based upon inodes. f)mount - Associate partitions with directories in file system space. g)Use fdisk to create partitions or parted to resize partitions; use hdparm to test performance of each partition. Use dumpe2fs to locate frequency access files (e.g., libc) and tune2fs to move its inode close to the top of the volume or front of partition. 11)Describe the general evolution of Unix file formats inode, BSD FFS, and journaling file systems and give their design tradeoff. a)Early Unix file systems used four discrete areas. From low to high block addresses they were the Super Block, inodes, allocated blocks, and free blocks. Allocated blocks were a mix of regular and directory file types. b)Keeping all the inodes in one disk zone and file blocks in another led to many extra seek operations between the two zones. c)The BSD Fast File System (FFS) improved on the original Unix design by creating a series of 10s to 1000s of mini-disk images within a disk partition or file system. Thus, for most file access, the R/W head only had to move between inodes and file blocks within the mini-disk. d)Unix file systems led the trend of keeping new files or modified files in the data cache to improve I/O speed. To prevent data loss, the data cache was flushed every 30 seconds. e)The Linux Ext3 file system format employs the BSD FFS improvements and a journaling feature which saves the recently modified disk blocks in a scratch pad area along with a record of the sequence of operations on the file that contained the new blocks. In this way, if the OS crashed, then the file could be rebuilt by repeating the series of recorded actions on the file. f)Although these changes reduced R/W head movement and improved data integrity they also introduced more indexing overhead of tracking mini-disks and more disk writes saving modified disk blocks. At least one study indicated a 22% increase in overhead resulting from these design "benefits." http://www.namesys.com/benchmarks.html Omit)Support or refute the idea of using ext2 for performance or ext3 for a sense of security. 12)Describe the forthcoming changes in hard disks and the effects it will have on OS file system design for reading and writing files. a) Parameter 2006 2009 2013 Improvement Capacity (GiB) 500 2000 8000 16x Bandwidth (Mb/s) 1000 2000 5000 5x Read seek time (ms) 8 7.2 6.5 1.2x b)It takes an hour or more to sequentially read today's 500 GiB drives. c)In 2013, it will take about 3.5 hours, an increase of 3 times. d)Random I/O will take significantly more time since seek times are nearly flat for the foreseeable future. For example, database random I/O accessing 10% of the disk will take much longer on an 8TiB 2013-era disk drive than it on today's 500GiB disk drives. e)I/O errors will be more frequent even though the total per-bit error rate will drop. 13)Describe the problem with consistency checking future disk drives. a)Journaling file systems are designed to stop synchronization errors with the OS, but not low-level media errors. Today's file managers assume low frequency I/O error storage devices. b)Increasing number of I/O errors in the future mean that fsck will be run more often. c)Fsck on multi-terabinarybyte file systems today can easily take two days. d)Recently, the Linux kernel source server suffered file system corruption. It took over a week for fsck to repair the large ext3 file system, when it would have taken far less time to restore from backup. 14)Explain the last-file-block problem, the impact this will have on future capacity disk drives, and give a possible way to mitigate the problem. a)Linux uses 4KiB pages, 4KiB logical OS blocks, and files are stored in 4KiB blocks (even though the physical record of a disk is 512 bytes). b)Regardless of file size, the last file block may be full or contain just one byte. The left over space goes unused. c)Thus, the average file system leaks (number of files * 2KiB) bytes due to last-file-block fragmentation. Assuming 60K files per 70GiB, thats 120MiB or about one percent (0.0017). For today's 500GiB hard disks that would be about 850MiB of fragmentation. d)As disk capacity grows, so will last-file-block fragmentation. An 8TiB Drive will have about 1.3GiB of last-file-block fragmentation. e)Approximately half of the files stored on a file system are 2KiB or less. A typical distribution of disk file sizes look like this: File size % total <= 1 KB 25-30% <= 2 KB 40-45% <= 3 KB 50-60% <= 4 KB 55-65% f)Thus, if these small files could be stored, or combined with, the file's meta data area, then the average file system block leak could be reduced by 25% or recapture 25MiB with today's disk drives or recapture 325MiB in a 8TiB drive of the future. g)Another possibility is to create a "sequential" file type that is allocated block from the back of the free list. In this way sequential files would grow towards indexed file for maximal disk usage. (But it would also require special "sequential" command arguments to the copy utility as well as a re-sequencer utility to re-shuffle sequential files when another is deleted. 15)Contrast the theory of operation between hard disk and CD drives and explain how this difference effects device driver design. a)HD - constant angular velocity - disk speed constant - bit density varies. (Disk is used for digital formats.) CD - constant linear velocity - disc speed varies - bit density constant. (Disc is used for analog formats.) b)HD - light weight 10ms integrated circuit R/W head slider in contact with medium. CD - heavy 150ms optical R/W head suspended above medium. c)HD - random access to any 512 byte physical record. CD - "music tracks" broken into 2KB blocks and sequentially searched - CDs with random (indexed) I/O applications are extremely slow. d)HD - 1st track on outside edge - disk may fill in random order CD - 1st track on hub edge - disc grows sequentially to outside edge in sequence - allows smaller credit card size CDs and mini CDs. e)HD - OS builds files from logical OS blocks which are built from 512 byte records. CD - OS reads fixed directory with fixed files. 16)Describe the theory of operation of a CD-RW drive and the design alternatives for file manager and device driver. (e.g., DirectCD) a)HD - data granularity is the 512 byte physical record. CD - write files at proper speed (1X - 40X) in one or more tracks (sessions) which have a 25MB preamble lead-in and and 25MB trailer lead-out per session. Plus a new table of contents (directory) is written at end of each session. b)CD-RWs - appear to users as a hard disk, but writing a few files at a time quickly consumes available disc space. Omit)Support or refute the idea of using general multi-tasking multi-user algorithms on tablet, laptop, desktop, or embedded computers. Lecture - OS synchronization concepts 1)Define "deadlock" and describe the four conditions for deadlock to occur. a)Mutual Exclusion - Two or more processes compete for exclusive control of two or more resources. b)Hold and Wait - Each process has at least one resource and is waiting for access to another. c)No Preemption - Processes cannot be stopped or reset. d)Circular Wait - Processes A will wait for process B to release its resource while process B is waiting for process A and so on. 2)Define "semaphore" and explain how it is implemented, and give an example. a)Semaphores serialize parallel (concurrent) access to a resource via a un-interruptible two-step algorithm. b)When activated, the semaphore routine checks a global variable to see if others have access to the shared resource. c)If not, it sets the global variable to let other processes know that it is now has access to the resource. d)To prevent interruption, these two steps are implemented in a single, test-and-set assembly language instruction. e)Producer-consumer problem, Multi-CPU access to process descriptor list, multi-user access to in-memory inode data structure, etc.