Google Technology Review Topix Weblog: The Secret Source of Google's Power 1)Define Web page snippet and describe the Google technology required to offer this service. 2)Describe the general engineering scaling problem and how it relates to "Gmail." 3)In terms of software and hardware maintenence, explain how system administration differs in a Wal-Mart versus Google data center. 4)Explain the usual tension between data center operations and programmers and explain how the Google choice flipped this relationship. 5)Describe the distributed, yet centralized aspect of the Google system architecture and how this relates to the provided services. The Google File System, SOSP, October 2003 1)Describe the four key design constraints that guided the development of GFS. 2)Define the following GFS components: Master, Chunkserver, chunk, client, replicas, metadata, orphaned chunk, and HeartBeat message. 3)Explain why Google eliminated the Linux "cache-cache" or file cache. 4)Describe the sequence of operations and the data versus control flow in GFS. 5)Give the GFS chunk size, three benefits of a large chunk, explain how chunks can lead to hot spots and how to resolve the problem. 6)Describe the three metadata types and the in-memory design choice. 7)The master is unique in that it employs in-memory data structures. This design choice offers three advantages, but one distinct disadvantage. 8)Explain how the master is able to know and control the slaves, and explain why this design evolved over the life of the project. 9)Explain how the operation log in combination with checkpoints greatly simplified the reliability of the master. Explain how this method is immune to a crash that may occur during the backup process. 10)Contrast and compare the GFS terms "consistent" and "defined" and describe the two key steps in defining a region. 11)In what four ways is a Google application (bot crawler) adapted to deal with the "relaxed" GFS consistency model. 12)Contrast the GFS terms: primary, secondary, lease, and lease expiration. 13)Describe the seven control and data flow steps in a GFS write operation. 14)Explain how GFS write operations are atomic, yet unlocked. 15)Compare the behavior of applications whose writes fail in a traditional file system versus the GFS. 16)Describe how the GFS implementation of "namei" improves efficency and allows more functionality than in a traditional Unix file manager. 17)Describe the three level Google latency to file data and the "willing" tradeoff they make with regard to scalability, reliability, and availability. 18)Explain how one metric allows the GFS to balance it's entire distributed system for file creation, chunk replacement, and balancing (five parts). 19)Give four conditions that lead to the need for chunk copying and the four special cases for selection of chunks to copy. 20)Describe the five-step file deletion procedure in GFS and the application program behavior that confounds the procedure. 21)Explain how the chunk version number (CVN) is shared among the GFS agents and describe its role. (In other words, explain the behavior of a 300 Gb chunkserver holding parts of 300 k files after it has been off line for several hours and comes back on line.) 22)Explain what happens if the master fails and explain the role of "shadow masters." 23)Describe how chunkservers maintain data integrity (three parts) and how writing over existing data is a three step process. 24)Describe the network protocol employed in GFS, two types of logging and the design tradeoff.