Threaded Binary Tree and Morris Traversal

2019-03-22

Threaded Binary Tree and Morris Traversal

Motivation

对于有着空的左孩子右孩子指针的节点内存被浪费了，所以在TBT里我们利用这些内存来储存一些地址

Transform a normal binary tree to TBT

最左边最右边的空指针不改动

将其他的空指针改为：

Left ptr = inorder predecessor 因为该节点没有inorder predecessor 中序遍历第一个节点

Right ptr = inorder successor 因为该节点没有inorder successor 中序遍历最后一个节点

我们可以在节点里加入两个flag来代表左指针右指针指向的是ancestor还是child
最后对于中序遍历的起点和终点的左右指针连向dummy node flag设为ancestor

莫里斯遍历利用线段树的概念可以实现iterative inorder traversal of tree without using stack TIME O(N) SPACE O(1)
Pseudocode

Initialize current as root 
While current is not NULL
   If current does not has a left child
      a) access current's data
      b) Go to the right, i.e., current = current->right
   Else
      a) Make current as right child of the rightmost 
         node in current's left subtree
      b) Go to that left child, i.e., current = current->left  (set current's left to be null)

LeetCode94: use above method to achieve inorder traversal

class Solution {
    public List<Integer> inorderTraversal(TreeNode node) {
        List<Integer> list = new ArrayList();

        while(node != null) {
            if(node.left == null) {
                list.add(node.val);
                node = node.right;
            }
            else {
                TreeNode nextNode = node.left;
                TreeNode p = node.left;
                while(p.right != null) p = p.right;
                p.right = node;//p:rightmost node in the leftsubtree
                node.left = null;
                node = nextNode;
            }
        }
        return list;
    }
}

Database Management System - Relational Design Theory

2019-03-18

CS564 - DBMS - Relational Design Theory

SQL

Relational algebra and Calculus

Keys

Asuperkey is a set of one or more attributes that, taken collectively, allow us to identify uniquely a tuple in the relation
Such minimal superkeys are called candidate keys.
We shall use the term primary key to denote a candidate key that is chosen by the database designer as the principal means of identifying tuples within a relation.
Foreign Key is a set of attributes in a referencing relation, such that for each tuple in the referencing relation, the values of the foreign key attributes are guaranteed to occur as the primary key value of a tuple in the referenced relation.

Join

natural join combines two tables based on identical columns
Cartesian product operation combines tuples from two relations, but unlike the join operation, its result contains all pairs of tuples from the two relations

SQL Language

SQL allows us to use the keyword all to specify explicitly that duplicates are not removed
varchar(n): A variable-length character string with user-specified maximum length n. The full form, character varying, is equivalent.
numeric(p, d):Afixed-point numberwith user-specified precision. The number consists of p digits (plus a sign), and d of the p digits are to the right of the decimal point.
Percent (%): The % character matches any substring.
Underscore (_): The character _ matches any character.
The set operation automatically eliminates duplicates

Database Design (ER Model)

Terms

Entity
Attribute
- Domain
- Key
- Primary Key can not be null
Relationship
- Descriptive attribute, the attribute of the entity in between two entities
- A relationship should be uniquely identified
- Instance of relationship set is a set of relationships
- Sometimes a relationship can involve two identities in the same enetity set
Constraints
- Participation constraint, total, partial
- weak enetity set
- ISA inheritance relationship set
  - specialzation, generalization
  - overlap, covering constraint

The Relational Model

Integrity Constriant
- Foreign key constraint

Relational Design Theory

函数依赖 Functional Dependency

考虑如下的university database schema

将表instructor和department替换为其自然连接的结果不是一个好的选择

因为对于每个instructor都有部门budget信息大量重复我们需要保证任何更新操作都要同步budget。另一个缺点是natural inner join会去除左右两边为null的情况，所以没有instructor的部门就无法在表中被表示。
但如果我们先有inst_dept表，我们该如何知道这个设计不好并且应该被分解成instructor和department呢？
我们需要发现每个department必须只有1个building,每个department必须只有一个budget。
Not every ER design can be precise enough to avoid such problems，so we need to allow designers to specify rules such as “each specific value of dept_name corresponds to at most one budget” even in cases where dept_name is not the primary key for the schema.
In this case, we need to write a rule “if there were a schema(dept_name,budget),then dept_name is able to serve as the primary key.” This rule is specified as a functional dependency:

This gives sufficient information to recognize the problem of inst_dept schema. Therefore functional dependency says what relational instances are allowed under certain constraints.
Functional depedency generalizes notions of key in a relation. It depicts relationship between attributes.
Definition: A relational instance satisfies functional dependency from attribute A->B if (any) two tuples having same value of attribute A also have same value of attribute B. i.e. A uniquely determines B
A trivial functional dependency is the one which will always hold in a relation. A->B and B is a subset of A.
Non-Trivial functional dependency may or may not hold in a relation. A->B and B is NOT a subset of A.
Properties of functional dependency
“Armstrong’s Axioms”

Other
"A set of functional dependencies" F is a set of FD constraints on legal relational instances in a relation.
"The closure of FDs set F" F+ is a set of all FDs that can be inferred from given the set F (Note that this includes all FDs in F itself).
Attribute Closure: Attribute closure of an attribute set A is a set of all attributes that can be functionally determined from elements of set A (Note that this includes attributes in A itself).
Examples:
Determining equivalence of functional dependencies
Check whether 2 FD sets are subset of each other’s closure.
Computing minimal cover
A minimal cover of a FD set A is the smallest set of FDs that covers A.
Other notes
所有非主属性都完全函数依赖于每个候选键
所有主属性都完全函数依赖于每个不包含它的候选键
没有任何属性完全函数依赖于非候选键的任何一组属性

数据异常 Data Anomalies

以下的学生课程关系的函数依赖为 Sno, Cname -> Sname, Sdept, Mname, Grade，键码为 {Sno, Cname}。也就是说，确定学生和课程之后，就能确定其它信息。

Sno	Sname	Sdept	Mname	Cname	Grade
1	学生-1	学院-1	院长-1	课程-1	90
2	学生-2	学院-2	院长-2	课程-2	80
2	学生-2	学院-2	院长-2	课程-1	100
3	学生-3	学院-2	院长-2	课程-2	95

不符合范式的关系，会产生很多异常，主要有以下四种异常：

冗余数据：例如 学生-2 出现了两次。
修改异常：修改了一个记录中的信息，但是另一个记录中相同的信息却没有被修改。
删除异常：删除一个信息，那么也会丢失其它信息。例如删除了 课程-1 需要删除第一行和第三行，那么 学生-1 的信息就会丢失。
插入异常：例如想要插入一个学生的信息，如果这个学生还没选课，那么就无法插入。

范式 Normal Form

范式理论是为了解决以上提到四种异常。
高级别范式的依赖于低级别的范式，1NF 是最低级别的范式。

分解

Universal Relation: A relation that captures all the information in schema and is decomposable into smaller relations.

Decomposition of a relation is done when a relation in relational model is not in appropriate normal form. A good/detailed ER model should end up directly as a 3NF or BCNF. The functional dependencies guide us to determine entities and their relationships
Ideal decomposition should be lossless and dependency preserving.
For relation R decomposed into R1 and R2:
Example:

employee表被分解为employee1表和employee2表：因为employee可以重名，所以我们将分解结果natural join会无法还原employee表。
Lossless decomposition(lossless join decomposition) Requirements
Dependency preserving decomposition
If we decompose a relation R into relations R1 and R2, All dependencies of R either must be a part of R1 or R2 or must be derivable from combination of FD’s of R1 and R2.
- Example:
1. R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of R1(ABC).
2. R(A,B,C,D) and functional dependencies A->B and C->D. Then the decomposition of R into R1(AB) and R2(CD) is dependency preserving but not lossless join because it violates the second condition of lossless join and A->B can be ensured in R1(AB) and C->D can be ensured in R2(CD). Hence it is dependency preserving decomposition.

1. 第一范式 (1NF) Atomicity(原子性)

表的属性不可分（表的属性不可为复合属性）。然而注意复合属性有时也是有用的并被大量实际使用于面向对象数据库。

2. Boyce–Codd normal form(BCNF) 每个表中只有一个候选键

BCNF is a slightly stronger version of 3NF.
Every BCNF satisfies 3NF. It eliminates all redundancy that can be discovered based on functional dependencies.
A database design is in BCNF if each member of the set of relation schemas that constitutes the design is in BCNF.

Definition：
A relational schema R is in BCNF with respect to a set of functional dependencies F if for all functional dependencies in F+ of form A->B where A,B are subsets of R, at least one of the following holds:

A->B is a trivial functional dependency (B is a subset of A)
A is a super key of schema R

Another equivalent definition (important):
For all nontrivial FD A->B, A must be a super key of schema R.
The negated definition
A relational schema R is not in BCNF w.r.t FDs set F if there exists a FD in F+ of form A->B where A,B are subsets of R s.t.
A->B is a nontrivial FD (B is not a subset of A) AND A is not a super key of schema R.
Example:
inst_dept (ID, name, salary, dept_name, building, budget) is not in BCNF because dept_name->budget is a nontrivial FD and dept_name is not a super key of inst_dept.
The schema instructor and department are not in BCNF because for all nontrivial FDs, either left side is not super key, or is super key. (in that case: ID, dept_name can be super key for each schema).
Decomposing relational schema to BCNF
For a relational schema R not in BCNF, we must have at least 1 nontrivial FD A->B s.t. A is not a super key for R.
We decompose R into 2 schemas:

For inst_dept, A=dept_name, B={building,budget}, it can be decomposed to (dept_name, building, budget) AND (ID,name,salary,dept_name)
Problem of BCNF:

BCNF is not dependency preserving.

3. 第三范式 (3NF) 2NF&消除传递函数依赖

BCNF requires A to be a superkey for nontrival FD A->B. 3NF relaxes the constraints by allowing A to not be a super key.
Definition：
A relational schema R is in 3NF with respect to a set of functional dependencies F if for all functional dependencies in F+ of form A->B where A,B are subsets of R, at least one of the following holds:

A->B is a trivial functional dependency (B is a subset of A)
A is a super key of schema R
Each attribute X in B-A is contained in a candidate key in R

Sno	Sname	Sdept	Mname
1	学生-1	学院-1	院长-1
2	学生-2	学院-2	院长-2
3	学生-3	学院-2	院长-2

存在以下transitive functional dependency：

Sno -> Sdept -> Mname

可以进行以下分解：

Sno	Sname	Sdept
1	学生-1	学院-1
2	学生-2	学院-2
3	学生-3	学院-2

Sdept	Mname
学院-1	院长-1
学院-2	院长-2

Multi-dimensional database

Think spreadsheets and reporting but generalized.

A star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions. A fact is an event that is counted or measured, such as a sale or login. A dimension contains reference information about the fact, such as date, product, or customer. A star schema is diagramed by surrounding each fact with its associated dimensions. The resulting diagram resembles a star.
Example:

Levels of abstraction in DBMS

Physical Schema is the way the relations are actually stored in SSD/HDD. It also defines indexes, statistics etc. defined on the table. (Indexes are defined using DDL.)
Logical (Conceptual) Schema is the DDL for creating the table. (It can sometimes specify the physical schema.)
External Schema is the DDL that define’s the external user’s view of the database. Though some “base” tables are directly visible, most of it is protected through views.

Storage and File Structure

Storage Overview

Persistent storage nonvolatile memory
- Data in a DBMS has to be persistent
- SSD: flash memory based, faster and more expensive
- HDD: magnetic storage, slower and cheaper
- Tapes: disks are the new tapes, still great for archiving
- Cloud server
Volatile storage
- DRAM

Memory hierarchy

Disk

The disk surface is logically divided into tracks, which are subdivided into sectors.
- A sector is the smallest unit of information that can be read from or written to the disk.
- The read–write head stores information on a sector magnetically as reversals of the direction of magnetization of the magnetic material.
- The platters that are fixed on a spindle rod spin, say 90 rps
- Arm assembly moves in or out to position a head on desired track.
- The tracks make an imaginary 空心cylinder: tracks with the same radius on all surfaces of disks platters #cylinder = #tracks/surface
- #surface = #tracks/#cylinder = 2#platters
- Only one head used at a time
- Block size is a multiple of sector size (fixed, usually 512 bytes)
- capacity = heads(#surfaces) x cylinder x sectors x 512 (typical size of one sector in bytes) = #surfaces * #tracks * #sectors per track * bytes per sector
Seek time: the time for repositioning the arm to change track.
Rotational delay: the time for getting to the right sector = 0.5(60s/磁盘转速rpm) 0.5为average
transfer time: actual overhead for transfering data = moving head across block

Seek time + rotational delay is the major contributor to delay.
Formatting: defining polar coordinate system on disk surface.
wear leveling: storing hot data on less weared sectors and storing cold data on more weared sector to extend service time of the disk.

RAID 0 Data stripping
RAID 0 consists of striping, but no mirroring or parity. Compared to a spanned volume, the capacity of a RAID 0 volume is the same; it is the sum of the capacities of the disks in the set. But because striping distributes the contents of each file among all disks in the set, the failure of any disk causes all files, the entire RAID 0 volume, to be lost. A broken spanned volume at least preserves the files on the unfailing disks. The benefit of RAID 0 is that the throughput of read and write operations to any file is multiplied by the number of disks because, unlike spanned volumes, reads and writes are done concurrently,[11] and the cost is complete vulnerability to drive failures.
RAID 1 Mirroring
RAID 1 consists of data mirroring, without parity or striping. Data is written identically to two drives, thereby producing a “mirrored set” of drives. Thus, any read request can be serviced by any drive in the set. If a request is broadcast to every drive in the set, it can be serviced by the drive that accesses the data first (depending on its seek time and rotational latency), improving performance. Sustained read throughput, if the controller or software is optimized for it, approaches the sum of throughputs of every drive in the set, just as for RAID 0. Actual read throughput of most RAID 1 implementations is slower than the fastest drive. Write throughput is always slower because every drive must be updated, and the slowest drive limits the write performance. The array continues to operate as long as at least one drive is functioning.
RAID 5 Stripping with Distributed Parity
Upon failure of a single drive, subsequent reads can be calculated from the distributed parity such that no data is lost. RAID 5 requires at least three disks.
RAID 6 two parity disks
RAID 6 consists of block-level striping with double distributed parity. Double parity provides fault tolerance up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems, as large-capacity drives take longer to restore. RAID 6 requires a minimum of four disks. As with RAID 5, a single drive failure results in reduced performance of the entire array until the failed drive has been replaced.[11] With a RAID 6 array, using drives from multiple sources and manufacturers, it is possible to mitigate most of the problems associated with RAID 5. The larger the drive capacities and the larger the array size, the more important it becomes to choose RAID 6 instead of RAID 5.

Database Buffer

A major goal of the database system is to minimize the number of block transfers between the disk and memory.
The subsystem responsible for the allocation of buffer space is called the buffer manager.

File Organization

A fixed length record is one where the length of the fields in each record has been set to be a certain maximum number of characters long. Suppose a field that was going to contain a name was set to be 25 characters long. This means that the field could only ever contain up to 25 characters. If all the fields in the record have a fixed length like this then the record is said to be a fixed length record. The problem with fixed length records is that each field very rarely contains the maximum number of characters allowed. This means that a lot of space is needlessly set aside and wasted. Also, values sometimes cannot be entered because they are too large to fit inside the allowed space in a field. The advantage of fixed length records is that they make file processing much easier because the start and end of each record is always a fixed number of characters apart. This makes it much easier to locate both indicidual records and fields.

A variable length record is one where the length of a field can change to allow data of any size to fit. The advantage of variable length records is that space is not wasted, only the space needed is ever used. The main problem with variable length records is that it is much more difficult to locate the start and end of individual records and fields. This is because they are not separated by a fixed amount of characters. To separate variable length recordseach field has a special character to mark where it ends- called an end- of- field marker. When records need to be located the computer must count through the end- of- field markers to locate individual records and fields.

Columnar storage
A column-oriented DBMS (or columnar database management system) is a database management system (DBMS) that stores data tables by column rather than by row. Practical use of a column store versus a row store differs little in the relational DBMS world. Both columnar and row databases can use traditional database query languages like SQL to load data and perform queries. Both row and columnar databases can become the backbone in a system to serve data for common extract, transform, load (ETL) and data visualization tools. However, by storing data in columns rather than rows, the database can more precisely access the data it needs to answer a query rather than scanning and discarding unwanted data in rows. Query performance is increased for certain workloads.

Storage summary

Databases must have persistent storage: SSD, HDD Their performance characteristics affect database design
Buffer manager tries to keep the optimal set of data blocks (pages) in memory to minimize I/O
must be able to “lock” (pin) a page in memory
must be able to write / flush page to disk on demand
Rows comprise both fixed and variable length fields
Slotted page format is flexible to keep records organized and accessible on a page
Single column values could be stored in fixed length records, but are usually compressed via encodings
Compression is used at both column and block level Heap files are ok but most databases use B+ trees
Columnar storage is another tool in the database tool-kit, soon all DBMS vendors will have it

Indexing and Hashing

B+ tree

Properties
- It can be shown that the number of I/O operations needed in the worst case for an insertion is proportional to logn/2(N), where n is the maximum number of pointers in a node, and N is the number of records in the file being indexed.
- It contains up to n − 1 search-key values K1 , K2 , . . . , Kn − 1 , and n pointers P1 , P2 , . . . , Pn . The search-key values within a node are kept in sorted order; thus, if i < j, then Ki < Kj
- Each non-leaf node in the tree has between n/2 and n children, where n is fixed for a particular tree.
- A non-leaf node may hold up to n pointers, and must hold at least n/2 pointers. The number of pointers in a node is called the fanout of the node.
- Each leaf can hold up to n − 1 values. We allow leaf nodes to contain as few as (n − 1)/2 values. With n = 4 in our example B±tree, each leaf must contain at least 2 values, and at most 3 values.
- Since, n = 4 and 1 < (n − 1)/2, we must either merge the node with a sibling node, or redistribute the entries between the nodes, to ensure that each node is at least half-full.
Insertion
- Split
- Coalesce
Deletion
- if the occupancy of a node falls below 2n/3, the system attempts to borrow an entry from one of the sibling nodes
- Borrow left, pick max one
- Borrow right, pick min one
- Merge left, shift left, change parent
- Merge right, shift right, change parent
Non-unique search key: If a relation can have more than one record containing the same search key value (that is, two or more records can have the same values for the indexed attributes), the search key is said to be a non-unique search key.
Bulk-loading in B+ tree
- Insertion of a large number of entries at a time into an index is referred to as bulk loading of the index.
- sort the file on the search key of the index being constructed
- There is a significant benefit to sorting the entries before inserting them into the B+tree.
- nodes will never have to be read from disk during bulk load, if the B+ tree was empty to start with. Each leaf node will thus incur only one I/O operation even though many entries may be inserted into the node.
bottom-up B+ tree construction
- it can be constructed faster by building it bottom-up, from the leaf level, instead of using the usual insert procedure.
- we break up the sorted entries into blocks, keeping as many entries in a block as can fit in the block; the resulting blocks form the leaf level of the B±tree. The minimum value in each block, alongwith the pointer to the block, is used to create entries in the next level of the B±tree, pointing to the leaf blocks

HASHING

Static Hashing

We use the term bucket to denote a unit of storage that can store one or more records.
let K denote the set of all search-key values, and let B denote the set of all bucket addresses. A hash function h is a function from K to B. Let h denote a hash function.
Hash function should be random and uniform
Algorithm is similiar to Data Structure Hash Table

Dynamic Hashing

BITMAP

A bitmap is simply an array of bits. In its simplest form, a bitmap index on the attribute A of relation r consists of one bitmap for each value that A can take. Each bitmap has as many bits as the number of records in the relation. The ith bit of the bitmap for value vj is set to 1 if the record numbered i has the value vj for attribute A. All other bits of the bitmap are set to 0.

To recognize deleted records,we can store an existence bitmap, in which bit i is 0 if record i does not exist and 1 otherwise.
- If some records have been deleted, however, just computing the complement of a bitmap is not sufficient. Bits corresponding to such records would be 0 in the original bitmap, but would become 1 in the complement, although the records don’t exist.
Limitation
- If there are further conditions, the fraction of records satisfying all the conditions is likely to be quite small.
- If the fraction is large, scanning the entire relation would remain the cheaper alternative.
Counting the number of bits that are 1 in a bitmap can be done quickly by a clever technique. We can maintain an array with 256 entries, where the ith entry stores the number of bits that are 1 in the binary representation of i.

储存与索引总结

atomicity problem: transaction is executed totally or not at all
integrated storage structure: index file 和record file一起
seperate file: 相反

clustered index 与原数据同顺序
clustered indexes do not guarantee sequential storage on the disk. Managing exactly where data is placed on the disk is the job of the OS, not the DBMS. But it suggests that items are ordered generally according to the clustering key.

With a non clustered index there is a second list that has pointers to the physical rows. You can have many non clustered indexes, although each new index will increase the time it takes to write new records.

It is generally faster to read from a clustered index if you want to get back all the columns. You do not have to go first to the index and then to the table.

Writing to a table with a clustered index can be slower, if there is a need to rearrange the data

primary index: using primary key for indexing

secondary index: otherwise

index以block为单位进行index within block用offset

hash index 通过哈希函数生成hash address to a bucket with possible overflow chain for managing collision
cheaper than B+tree if no overflow occurs Access: O(1+#overflow buckets)
所以hash索引的最大的特点就是等值查询快，不能进行范围索引。
位图索引适合静态low-cardinality重复数据
b树索引同时支持范围及等值查询

b tree m-way(order m, m fanout, m-1info fields) search tree with additional constraints: 叶子层高度相同 root 2 key 其他节点至少半满ceiling(order/2)来尽量减少高度若想要插入的位置已满 recursively按中序遍历顺序将中点上移同时将前驱后继节点分开始终保持节点半满的要求
b+ tree 更贴近多级索引，是在b树基础上, nonleaf node sparse index 减少disk page access 支持equality search 在叶子层将nonleaf节点key按中序遍历顺序拷贝下来叶子层包含record ptrs 保持中序遍历顺序建立链表形成dense & clustered index 从而支持range search
删除：左合并右合并来满足半满的限制 split if necessary can propagate to root.
order=#ptr fields = p /node
#k,v fields = p-1 /node

(p-1)(key_ptr_size + record_ptr_size) + p(block_ptr_size) <= blocksize=512

static hashing: linear congruential hash function with fixed #hash buckets use overflow chain to manage contention

extendible hashing: nonlinear hashing congruential function such as h_k(v)=h(v) mod 2^k use directory of size 2^k to store ptrs to hash buckets
when collisions happen increment k value and maps it elsewhere

Operating System - Virtualization

2019-03-15

CS537 - Operating System Summary Part 1 Virtualization

Virtualization

Process

What is a process

A running program is a process
Stream of executing instructions and their “context”

Thread

Can have multiple threads within a single process
Lightweight process
Share an address space

Why do we need processes?

Share CPU: Time sharing

OS Scheduler

Scheduler save context when process is pause
Restore context on resumption

Goals for CPU Virtualization

Policy goals
- Virtualize CPU resource using processes
- Reschedule process for fairness? efficiency?
Mechanism goals
- Efficiency: Time sharing should not add overhead
- Control: OS should be able to intervene when required

Mechanism

System call

User mode and kernel mode
- User processes run in user mode (restricted mode)
- OS runs in kernel mode (not restricted)
System call
- Separate user mode from kernel mode for security
- Use system call to invoke kernel mode functions
Procedure for calling read()
1. Set system call table index to 6 movl $6, %eax
2. Call trap with id 64 int $64

Dispatch mechanism

Dispatch loop

while (1) {	
    run	process	A for some time-slice	
    stop process A and save its context	
    load context of another process B	
}

Cooperative Multi-tasking
- Trust process to relinquish CPU through traps
- Provide special yield() system call
- Processes can misbehave
Timer-based Multi-tasking
- Hardware generates timer interrupt (CPU or separate chip)
- User must not be able to mask timer interrupt

Policy

Vocabulary

Workload: set of jobs (arrival time, run_time)
Job ~ Current execution of a process
Scheduler: Decides which ready job to run
Metric: measurement of scheduling quality
Turnaround time = completion time - arrival time
Response time = first run time - arrival time

FIFO (First In, First Out)

Disadvantage: Turnaround time suffers when short jobs must wait for long jobs (Convoy Effect)

SJF (Shortest job first)

Disadvantage: Only schedule new job when previous job voluntarily relinquishes CPU

STCF (Shortest Time-to-Completion First)

Preemptive: Schedule different job by taking CPU away from running job
Always run job that will complete the quickest

Round Robin

Goal: reduce response time
Trade-off: increase turnaround time

I/O Aware Scheduling

Goal: process won’t hold CPU when doing IO

Multilevel Feedback Queue

Motivation: Run-time of each job is not known
Approach
- Multiple levels of round-robin
- Each level has higher priority than lower level
- Can preempt them
Rules
1. If priority(A) > Priority(B), A runs
2. If priority(A) == Priority(B), A & B run in RR
3. Processes start at top priority
4. If job uses whole slice, demote process (longer time slices at lower priorities)
Avoid starvation
- Problem: Low priority job may never get scheduled
- Solution: Periodically boost priority of all jobs (or all jobs thathaven’t been scheduled)

Memory Virtualization

Introduction

Goals

Transparency: Process is unaware of sharing
Protection: Cannot corrupt OS or other process memory
Efficiency: Do not waste memory or slow down processes
Sharing: Enable sharing between cooperating processes

Address space

Stack: No fragmentation
Heap: Consists of allocated and free areas (holes)

Memory Access Example

Assembly	Access for Instruction	Access for Execution
`0x10: movl 0x8(%rbp), %edi`	Fetch instruction at 0x10	Load from 0x208
`0x13: addl $0x3, %edi`	Fetch instruction at 0x13	No memory access
`0x19: movl %edi, 0x8(%rbp)`	Fetch instruction at 0x19	Store to 0x208

Basic Mechanisms

On process switch, save current process’s memory to disk and load another process’s memory from disk.
Ridiculously poor performance

Static Relocation

Idea
- OS rewrites each program before loading it as a process in memory
- Each rewrite for different process uses different addresses and pointers
- Change jumps, loads of static data
Disadvantage
- Process can destroy OS or other processes
- No privacy
- Cannot move address space after it has been placed
- May not be able to allocate new process

Dynamic Relocation: Introduction

Goal: Protect processes from one another
Memory Management Unit (MMU)
- MMU dynamically changes process address at every memory reference
- Process generates logical or virtual addresses (in their address space)
- Memory hardware uses physical or real addresses

Two operating modes
- Kernel mode
  - Can manipulate contents of MMU
  - Allows OS to access all of physical memory
- User mode
  - Perform translation of logical address to physical address

Dynamic Relocation: Base Register

Translation on every memory access of user process
MMU adds base register to logical address to form physical address
Store offset in base register
Each process has different value in base register
Dynamic relocation by changing value of base register.

Quiz
- What entity should do translation of addresses with base register? Hardware
- What entity should modify the base register? OS
Problem: No protection

Dynamic Relocation: Base + Bounds

Idea
- limit the address space with a bounds register
- Base register: smallest physical addr (or starting location)
- Bounds register: size of this process’s virtual address space
- OS kills process if process loads/stores beyond bounds
Implementation
- MMU compares logical address to bounds register
- if logical address is greater, then generate error
- MMU adds base register to logical address to form physical address

Context switch
1. Change to privileged mode
2. Save base and bounds registers of old process
3. Load base and bounds registers of new process
4. Change to user mode and jump to new process
Advantages
- Provides protection (both read and write) across address spaces
- Supports dynamic relocation
- Simple, inexpensive implementation: Few registers, little logic in MMU
- Fast: Add and compare in parallel
Disadvantages
- Each process must be allocated contiguously in physical memory
- Must allocate memory that may not be used by process
- No partial sharing: Cannot share limited parts of address space

Segmentation

Idea
- MMU contains Segment Table (per process)
- Each segment has own base and bounds, protection bits
- Example: 14 bit logical address, 4 segments;
Example
- Segment Table
  
  Segment Base Bounds R W
  
  0 0x2000 0x6ff 1 0
  
  1 0x0000 0x4ff 1 1
  
  2 0x3000 0xfff 1 1
  
  3 0x0000 0x000 0 0
- Translation
  
  Logical address Segment Base Physical address
  
  0x0240 0 0x2000 0x2240
  
  0x1108 1 0x0000 0x1108
  
  0x256c 2 0x3000 0x356c
  
  0x3002 3 0x0000 Fail
Advantages
- No extra memory access
- Enables sparse allocation of address space
- Stack and heap can grow independently
- Enables sharing of selected segments
- Read-only status for code
- Supports dynamic relocation of each segment
Disadvantages
- Each segment must be allocated contiguously
- May not have sufficient physical memory for large segments?
- External Fragmentation

Segment	Base	Bounds	R W
0	0x2000	0x6ff	1 0
1	0x0000	0x4ff	1 1
2	0x3000	0xfff	1 1
3	0x0000	0x000	0 0

Logical address	Segment	Base	Physical address
0x0240	0	0x2000	0x2240
0x1108	1	0x0000	0x1108
0x256c	2	0x3000	0x356c
0x3002	3	0x0000	Fail

Summary

Description	Name of approach
One process uses RAM at a time	Time Sharing
Rewrite code and addresses before running	Static Relocation
Add per-process starting location to virt addr to obtain phys addr	Base
dynamic approach that verifies address is in valid range	Base + Bounds
Several base+bound pairs per process	Segmentation

Paging

Fragmentation

Definition
- Free memory that can’t be usefully allocated
Types of fragmentation
- External: Visible to allocator (e.g., OS)
- Internal: Visible to requester

Introduction for Paging

Goal
- Eliminate requirement that address space is contiguous
- Eliminate external fragmentation
- Grow segments as needed
Idea
- Divide address spaces and physical memory into fixed-sized pages (usually 4KB)

Translation of Page Addresses

Logical address
- High-order bits of address designate page number
- Low-order bits of address designate offset within page

Address Format

Page Size	Low Bits	Virt Addr Bits	High Bits	Virt Pages
16 bytes	log(16) = 4	10	10 - 4 = 6	2 ^ 6 = 64
1 KB	log(1K) = 10	20	20 - 10 = 10	2 ^ 10 = 1024
1 MB	log(1M) = 20	32	32 - 20 = 12	2 ^ 12 = 4K
512 bytes	log(512) = 9	16	16 - 9 = 7	2 ^ 7 = 128
4 KB	log(4K) = 12	32	32 -12 = 20	2 ^ 20 = 1M

Address Translation
- Number of bits in virtual address need not equal number of bits in physical address

Pagetables

How should OS translate VPN to PPN?
- Simple solution: Linear page table aka array
Example
- Page table for P1: 3, 1, 7, 10
- Page table for P2: 0, 4, 2, 6
- Page table for P3: 8, 5, 9, 11
How big is a pagetable
- Given 32-bit address space, 4KB pages, 4 byte entries
- 4KB pages => 12 bit for offset
- 32-bit address space => 20 bit for VPN => 2 ^ 20 = 1MB entries
- 1MB entries * 4 byte per entry = 4MB
Where are pagetables stored
- Store each page table in memory
- Hardware finds page table base with register (e.g., CR3 on x86)
What happens on a context-switch?
- Change contents of page table base register to newly scheduled process
- Save old page table base register in PCB of descheduled process
What other info is in pagetable entries besides translation?
- valid bit
- protection bits
- present bit (needed later)
- reference bit (needed later)
- dirty bit (needed later)

Memory Access with Paging

Given
- Current instruction: 0x0010: movl 0x1100, %edi
- Assume PT is at phys addr 0x5000
- Assume PTE’s are 4 bytes
- Assume 4KB pages => 12 bits for offset
- Page table for current process: 2, 0, 80, 99
Fetch instruction at logical addr 0x0010
- Access page table to get ppn for vpn 0
- Mem ref 1: 0x5000
- Learn vpn 0 is at ppn 2
- Fetch instruction at 0x2010 (Mem ref 2)
Exec, load from logical addr 0x1100
- Access page table to get ppn for vpn 1
- Mem ref 3: 0x5000
- Learn vpn 1 is at ppn 0
- movl from 0x0100 into reg (Mem ref 4)

Advantages of Paging

No external fragmentation
- Any page can be placed in any frame in physical memory
Fast to allocate and free
- Alloc: No searching for suitable free space
- Free: Doesn’t have to coalesce with adjacent free space
Simple to swap-out portions of memory to disk (later lecture)
- Page size matches disk block size
- Can run process when some pages are on disk
- Add “present” bit to PTE

Disadvantages of Paging

Internal fragmentation: Page size may not match size needed by process
- Wasted memory grows with larger pages
- Tension?
Additional memory reference to page table -> Very inefficient
- Page table must be stored in memory
- MMU stores only base address of page table
Storage for page tables may be substantial
- Simple page table: Requires PTE for all pages in address space
- Entry needed even if page not allocated?

Paging Translation Steps

extract VPN (virt page num) from VA (virt addr)
calculate addr of PTE (page table entry)
read PTE from memory
extract PFN (page frame num)
build PA (phys addr)
read contents of PA from memory into register

TLB

Motivative Example: Iterating Array

Code

int sum = 0;
for (int i = 0; i < N; i++){ 
    sum += a[i];
}

Memory Access

What virtual addresses?	What physical addresses?
load 0x3000	load 0x100C load 0x7000
load 0x3004	load 0x100C load 0x7004
load 0x3008	load 0x100C load 0x7008
load 0x300C	load 0x100C load 0x7008

Introduction

Strategy: Cache Page Translations
TLB stands for Translation Lookaside Buffer

TLB Organization

TLB Entry

Tag (virtual page number) Physical page number (page table entry)
Fully associative
- Any given translation can be anywhere in the TLB
- Hardware will search the entire TLB in parallel

Example: Iterating Array with TLB

Code

int sum = 0;
for (int i = 0; i < 2048; i++){ 
    sum += a[i];
}

Page table for current process (starting at 0x0000)

PPN	1	5	4	…
VPN	0	1	2	3

TLB

Valid VPN PPN

1 1 5

1 2 4

Valid	VPN	PPN
1	1	5
1	2	4

Memory Access

What virtual addresses?	What physical addresses?
load 0x1000	load 0x0004 load 0x5000
load 0x1004	(TLB hit) load 0x5004
load 0x1008	(TLB hit) load 0x5008
load 0x100C	(TLB hit) load 0x500C
…	…
load 0x2000	load 0x0008 load 0x4000
load 0x2004	(TLB hit) load 0x4004

Performance
- # TLB lookups = number of accesses to a = 2048
- # TLB misses = 2
- Miss rate = 2/2048 = 0.1%
- Hit rate = 1 – miss rate = 99.9%

TLB Replacement Policies

Access Patterns
- Sequential array accesses almost always hit in TLB: Very fast!
- Highly random, with no repeat accesses: Slow
Code Example

Workload A Workload B
Workload Locality
- Spatial Locality: future access will be to nearby addresses
- Temporal Locality: future access will be repeats to the same data
What TLB characteristics are best for each type?
- Spatial:
  - Access same page repeatedly; need same vpn à ppn translation
  - Same TLB entry re-used
- Temporal:
  - Access same address near in future
  - Same TLB entry re-used in near future
  - How near in future? How many TLB entries are there?
Replacement policies
- LRU: evict Least-Recently Used TLB slot when needed
- Random: Evict randomly choosen entry

Context Switches

What happens if a process uses cached TLB entries from another process?
1. Flush TLB on each switch
  - Costly
  - lose all recently cached translations
2. Track which entries are for which process
  - Address Space Identifier
  - Tag each TLB entry with an 8-bit ASID
TLB Example with ASID
- Pagetable
  - P1 (ASID 11): 1, 5, 4, …
  - P2 (ASID 12): 6, 2, 3, …
- TLB
  
  Valid Virt Phys ASID
  
  0 1 9 11
  
  1 1 5 11
  
  1 1 2 12
  
  1 0 1 11
- Memory access
  
  Virtual Physical
  
  load 0x1444 with ASID 12 0x2444
  
  load 0x1444 with ASID 11 0x5444
TLB Performance
- Context switches are expensive
- Even with ASID, other processes “pollute” TLB
  - Discard process A’s TLB entries for process B’s entries
- Architectures can have multiple TLBs
  - 1 TLB for data, 1 TLB for instructions
  - 1 TLB for regular pages, 1 TLB for “super pages”

Valid	Virt	Phys	ASID
0	1	9	11
1	1	5	11
1	1	2	12
1	0	1	11

Virtual	Physical
load 0x1444 with ASID 12	0x2444
load 0x1444 with ASID 11	0x5444

TLB Misses

Who Handles TLB MISS? Hardware or OS?
Hardware: CPU must know where pagetables are
- CR3 register on x86
- Pagetable structure fixed and agreed upon between HW and OS
- HW “walks” the pagetable and fills TLB
OS: “Software-managed TLB”
- CPU traps into OS upon TLB miss
- OS interprets pagetables as it chooses
- Modifying TLB entries is privileged
- Need same protection bits in TLB as pagetable - rwx

Summary

Pages are great, but accessing page tables for every memory access is slow
Cache recent page translations -> TLB
- Hardware performs TLB lookup on every memory access
TLB performance depends strongly on workload
- Sequential workloads perform well
- Workloads with temporal locality can perform well
In different systems, hardware or OS handles TLB misses
TLBs increase cost of context switches
- Flush TLB on every context switch
- Add ASID to every TLB entry

Smaller Page Tables

Motivation

How big are page tables
1. PTE’s are 2 bytes, and 32 possible virtual page numbers
  - 2 bytes * 32 = 64 bytes
2. PTE’s are 2 bytes, virtual addrs are 24 bits, pages are 16 bytes
  - 16 bytes page => 4 bit offset => 20 bit VPN
  - => 2^20 Pages => 2^20 * 2 = 2MB for page tables
3. PTE’s are 4 bytes, virtual addrs are 32 bits, and pages are 4 KB
  - 4KB page => 12 bit offset => 20 bit VPN
  - => 2^20 Pages => 2^20 * 4 = 4MB for page tables
4. PTE’s are 4 bytes, virtual addrs are 64 bits, and pages are 4 KB
  - 4KB page => 12 bit offset => 52 bit VPN
  - => 2^52 Pages => 2^52 * 4 = 18.0143985 PB for page tables
Why are Page Tables so Large?
- Many invalid PT entries

Summary
- Storage for page tables may be substantial
- Simple page table: Requires PTE for all pages in address space
- Entry needed even if page not allocated.

Smaller Page Tables

Use more complex page tables, instead of just big array
Any data structure is possible with software-managed TLB
- Hardware looks for vpn in TLB on every memory access
- If TLB does not contain vpn, TLB miss
  - Trap into OS and let OS find vpn->ppn translation
  - OS notifies TLB of vpn->ppn for future accesses
Other approaches
1. Segmented Pagetables
2. Multi-level Pagetables
  - Page the page tables
  - Page the pagetables of page tables…
3. Inverted Pagetables

Paging with Segmentation

Idea
- Divide address space into segments (code, heap, stack)
- Divide each segment into fixed-sized pages
- Logical address divided into three portions
  
  seg # (4 bits) page number (8 bits) page offset (12 bits)
Implementation
- Each segment has a page table
- Each segment track base (physical address) and bounds of the page table

Quiz

Logical address layout

seg # (4 bits) page number (8 bits) page offset (12 bits)
Segment Table

Segment Base Bounds R W

0 0x002000 0xff 1 0

1 0x000000 0x00 0 0

2 0x001000 0x0f 1 1

Segment	Base	Bounds	R W
0	0x002000	0xff	1 0
1	0x000000	0x00	0 0
2	0x001000	0x0f	1 1

Translation

Virtual	Seg	Base	Offset	PPN	Physical	Note
0x002070 R	0	0x002000	2	0x004	0x004070
0x202016 R	2	0x001000	2	0x003	0x003016
0x104c84 R	1	-	-	-	-	R = 0
0x010424 W	0	-	-	-	-	W = 0
0x210014 W	2	-	-	-	-	bounds
0x203568 W	2	0x001000	3	0x02a	0x02a568

Advantages
- Advantages of Segments
  - Supports sparse address spaces.
  - Decreases size of page tables. If segment not used, not need for page table
- Advantages of Pages
  - No external fragmentation
  - Segments can grow without any reshuffling
  - Can run process when some pages are swapped to disk (next lecture)
- Advantages of Both
  - Increases flexibility of sharing: Share either single page or entire segment
Disadvantages
- Potentially large page tables (for each segment)
- Must allocate each page table contiguously
- More problematic with more address bits

Multilevel Page Tables

Goal: Allow each page tables to be allocated non-contiguously
Idea: Page the page tables
- Creates multiple levels of page tables; outer level “page directory”
- Only allocate page tables for pages in use
- Used in x86 architectures (hardware can walk known structure)

Multilevel Pagetable Translation
- Page directory and page tables
  
  0x0 0x1 … 0xE 0xF
  
  Page directory 0x3 - … - 0x92
  
  PT @PPN 0x3 0x10 0x23 … - -
  
  PT @PPN 0x92 - - … 0x55 0x45
- Address layout
  
  outer page (4) inner page (4) page offset (12)
1. Translate 0x01ABC
  - Outer page = 0x0 => Use page table at 0x3
  - Inner page = 0x1 => PPN = 0x23
  - Physical address = 0x23ABC
2. Translate 0xFEED0
  - Outer page = 0xF => Use page table at 0x92
  - Inner page = 0xE => PPN = 0x55
  - Physical address = 0x55ED0
Address Format for Multilevel Paging
- Given 30-bit address with 4KB page size
- #bits for page offset = log(4K) = 12
- 4 bytes per PTE => 1K entries per page => #bits for inner page = log(1K) = 10
- #bits for outer page = 30 - 10 - 12 = 8
Pagetable with 3 levels
- Problem
  - Page directories (outer level) may not fit in a page
- Solution
  - Split page directories into pieces
  - Use another page dir to refer to the page dir pieces.
- Memory Addressability Comparison
  - 1 level = 2¹⁰ * 2¹² = 4MB
  - 2 level = (2¹⁰)² * 2¹² = 4GB
  - 3 level = (2¹⁰)³ * 2¹² = 4TB
Quiz: Count Memory Access
- Assumption
  - 3-level page table
  - 256-byte pages
  - 16-bit addresses
  - ASIC of current process is 211
- TLB
  
  ASID VPN PFN Valid
  
  211 0xbb 0x91 1
  
  211 0xff 0x23 1
  
  122 0x05 0x91 1
  
  211 0x05 0x12 0
1. 0xAA10: movl 0x1111, %edi
  - TLB miss for 0xAA10 => 3 memory accesses for page table + 1 more to get the instruction
  - TLB miss for 0x1111 => 3 memory accesses for page table + 1 more to get the instruction
  - Total: 4 memory accesses
2. 0xBB13: addl $0x3, %edi
  - TLB hit for 0xBB13 => 1 access more to get the instruction
3. 0x0519: movl %edi, 0xFF10
  - TLB miss for 0x0519 => 3 memory access for page table + 1 more to get the instruction
  - TLB hit for 0xFF10 => 1 access more to get the instruction
  - Total: 5 memory accesses

	0x0	0x1	…	0xE	0xF
Page directory	0x3	-	…	-	0x92
PT @PPN 0x3	0x10	0x23	…	-	-
PT @PPN 0x92	-	-	…	0x55	0x45

ASID	VPN	PFN	Valid
211	0xbb	0x91	1
211	0xff	0x23	1
122	0x05	0x91	1
211	0x05	0x12	0

Inverted Page Table

Only need entries for virtual pages w/ valid physical mappings
Naïve approach:
- Search through data structure <ppn, vpn+asid> to find match
- Too much time to search entire table
Better:
- Find possible matches entries by hashing vpn+asid
- Smaller number of entries to search for exact match
Managing inverted page table requires software-controlled TLB

Swapping

Motivation

Support processes when not enough physical memory
Single process with very large address space
Multiple processes with combined address spaces

Idea

OS keeps unreferenced pages on disk
- Slower, cheaper backing store than memory
Process can run when not all pages are loaded into main memory
OS and hardware cooperate to make large disk seem like memory
- Same behavior as if all of address space in main memory

Locality of Reference

Leverage locality of reference within processes
- Spatial: reference memory addresses near previously referenced addresses
- Temporal: reference memory addresses that have referenced in the past
- Processes spend majority of time in small portion of code
Implication:
- Process only uses small amount of address space at any moment
- Only small amount of address space must be resident in physical memory
Memory Hierarchy

Mechanism

Each page in virtual address space maps to one of three locations:
- Physical main memory: Small, fast, expensive
- Disk (backing store): Large, slow, cheap
- Nothing (error): Free
Extend page tables with an extra bit: present
- permissions (r/w), valid, present
- Page in memory: present bit set in PTE
- Page on disk: present bit cleared
  - PTE points to block on disk
  - Causes trap into OS when page is referenced

Procedure

Hardware checks TLB
if TLB hit
    address translation is done
    page in physical memory
else // TLB miss
    Hardware or OS walk page tables
    if PTE designates page is present
        page in physical memory (i.e., present bit is cleared)
    else // page fault
        Trap into OS (not handled by hardware)
        OS selects victim page in memory to replace
        if victim page is modified
            write victim page out to disk
        OS reads referenced page from disk into memory
        Page table is updated, present bit is set
        Process continues execution

Policy: Page selection

When should a page on disk be brought into memory?
Demand paging: Load page only when page fault occurs
- Intuition: Wait until page must absolutely be in memory
- When process starts: No pages are loaded in memory
- Problems: Pay cost of page fault for every newly accessed page
Prepaging (anticipatory, prefetching): Load page before referenced
- OS predicts future accesses (oracle) and brings pages into memory early
- Works well for some access patterns (e.g., sequential)
Hints: Combine above with user-supplied hints about page references
- User specifies: may need page in future, don’t need this page anymore, or sequential access pattern, …
- Example: madvise() in Unix

Policy: Page replacement

Which resident page in memory should be thrown out to disk?
OPT: Replace page not used for longest time in future
- Advantages: Guaranteed to minimize number of page faults
- Disadvantages: Requires that OS predict the future; Not practical, but good for comparison
FIFO: Replace page that has been in memory the longest
- Intuition: First referenced long time ago, done with it now
- Advantages: Fair: All pages receive equal residency; Easy to implement
- Disadvantage: Some pages may always be needed
LRU: Replace page not used for longest time in past
- Intuition: Use past to predict the future
- Advantages: With locality, LRU approximates OPT
- Disadvantages: Harder to implement and does not handle all workloads well

Comparison

LRU, OPT	FIFO
Guaranteed to have fewer page faults Smaller memory sizes ⊆ larger memory sizes Smaller cache ⊆ bigger cache	Usually have fewer page faults May actually have more page faults!

Implementing LRU

Software Perfect LRU
- OS maintains ordered list of physical pages by reference time
- When page is referenced: Move page to front of list
- When need victim: Pick page at back of list
- Trade-off: Slow on memory reference, fast on replacement
Hardware Perfect LRU
- Associate timestamp register with each page
- When page is referenced: Store system clock in register
- When need victim: Scan through registers to find oldest clock
- Trade-off: Fast on memory reference, slow on replacement (especially as size of memory grows)
Approximating LRU: Clock Algorithm
- Hardware
  - Keep use (or reference) bit for each page frame
  - When page is referenced: set use bit (page was used recently)
- Operating System
  - Page replacement: Look for page with use bit cleared (has not been referenced for a while)
  1. Keep pointer to last examined page frame
  2. Traverse pages in circular buffer
  3. Clear use bits as search
  4. Stop when find page with already cleared use bit, replace this page

Summary

Abstraction: Virtual address space with code, heap, stack
Address translation
- Contiguous memory: base, bounds, segmentation
- Using fixed sizes pages with page tables
Challenges with paging
- Extra memory references: avoid with TLB
- Page table size: avoid with multi-level paging, inverted page tables etc.
Larger address spaces: Swapping mechanisms, policies (LRU, Clock)

Process Control

2019-02-10

进程控制

pid
fork
wait
exec

pid

每一个进程都有一个独特的非负整型pid

pid可以用来生成独特的文件名

进程终止时 pid会被回收重用

大部分unix系统会有代码来延迟重用从而保证新创建的进程和刚刚终止的进程pid相同

pid 1 通常为init process，会在启动结束时被kernel invoke 其program file在/etc/init （老版本的UNIX系统）或者/sbin/init （较新版本。这个进程负责bring up unix system after kernel has been bootstrapped by reading system-dependent initialization files: the /etc/rc* files or /etc/inittab and the files in /etc/init.d. 来带系统进入一定的状态：比如多用户状态。

init process is a user process (with superuser privileges) that never dies.

例子： mac os X 10.4里的launchd process

通常每个unix系统有自己的内核进程，有的系统pid2时pagedaemon 负责支持系统虚拟内存paging

以下所有函数都没有error return

fork

现存进程可以用fork来创建新进程

这个函数被call一次会返回两次

在子进程中返回值为0 原因：子进程只能有一个父进程可以getppid来获得

parent process返回值为子进程pid 原因：避免子进程pid被其他进程获

子进程父进程在fork之后会继续执行接下来的命令

The child is a copy of the parent. For example, the child gets a copy of the parent’s data space, heap, and stack. 但是会share text segment

Modern implementations don’t perform a complete copy of the parent’s data, stack, and heap, since a fork is often followed by an exec. Instead, a technique called copy-on-write (COW) is used. These regions are shared by the parent and the child and have their protection changed by the kernel to read-only. If either process tries to modify these regions, the kernel then makes a copy of that piece of memory only, typically a ‘‘page’’ in a virtual memory system.

通常父子进程执行顺序是nondeterministic的这取决与kernel的调度算法

若需要sychronize action 可以使用如上代码中的sleep来使父进程等待子进程的执行，但我们没有保证2秒的时间是合适的

其他同步方式 race conditions section 8.9 signal section 10.16

sizeof 和 strlen的区别：

When we write to standard output, we subtract 1 from the size of buf to avoid writing the terminating null byte. Although strlen will calculate the length of a string not including the terminating null byte, sizeof calculates the size of the buffer, which does include the terminating null byte.

Another difference is that using strlen requires a function call, whereas sizeof calculates the buffer length at compile time,(更快) as the buffer is initialized with a known string and its size is fixed.

为什么打印了两遍？write是没有缓冲的。 stdio库是有缓冲的：stdout如果连接着terminal device 会被line buffered 否则会被fully buffered

所以刚开始的printf缓存区被清空了但当我们重定向输出到文件的时候 printf打印了两行。在第二个情况下 printf在fork前被呼叫了1次，但在呼叫fork后buffer里缓存还在，buffer里的内容被拷贝到子进程此时父子进程都有这个相同的内容没被清空的buffer。最后一个printf appends its data to the existing buffer.进程终止时缓存区清空.

### file shareing

当我们重定向父进程的stdout输出的时候子进程stdout也被重定向

这是fork的一个特性： all file descriptors that are open in the parent are duplicated in the child. The parent and the child share a file table entry for every open descriptor

也就是说父子进程同时进行输出到stdout时一定要共享file offset. 若父进程输出被重定向，我们需要子进程在输出到stdout时更新父进程file offset。这样不仅子进程可以在父进程wait时候输出到stdout，在子进程结束输出时父进程可以在同位置继续之前的输出。如果没有共享file offset 这种互动会很难实现

通常有两种情况来处理 fork之后的descriptors的使用

父进程等待子进程结束：parent need to do nothing. 任何共享的descriptors会被子进程更新
父进程子进程都有自己的事情要干：fork之后父子进程各自关闭其不需要的descriptors，open descriptors互不干涉这种情况常见于网络服务器

其他被子进程继承的属性：

Real user ID, real group ID, effective user ID, and effective group ID
Supplementary group IDs
Process group ID
Session ID
Controlling terminal
The set-user-ID and set-group-ID flags
Current working directory
Root directory
File mode creation mask
Signal mask and dispositions
The close-on-exec flag for any open file descriptors
Environment
Attached shared memory segments
Memory mappings
Resource limits

父子进程的区别：

fork返回值
pid
ppid
子进程的tms_utime, tms_stime, tms_cutime, and tms_cstime值为0
父进程的文件锁不会被子进程继承
Pending alarms are cleared for the child.
The set of pending signals for the child is set to the empty set.

fork通常失败的原因有太多进程存在于系统或者对于当前用户进程数量超过系统限制： CHILD_MAX specifies the maximum number of simultaneous processes per real user ID.

fork 的两种使用

当一个进程想要自我复制从而父子进程同时执行一片相同的代码时：常见于网络服务器 the parent waits for a service request from a client. When the request arrives, the parent calls fork and lets the child handle the request. The parent goes back to waiting for the next service request to arrive.
当一个进程想要执行一个不同的程序这种情况下通常子进程使用exec right after it returns from the fork.（spawn by some other OS）

## wait

When a process terminates, either normally or abnormally, the kernel notifies the parent by sending the SIGCHLD signal to the parent.

Because the termination of a child is an asynchronous event—it can happen at any time while the parent is running—this signal is the asynchronous notification from the kernel to the parent.

The parent can choose to ignore this signal, or it can provide a function that is called when the signal occurs: a signal handler.

The default action for this signal is to be ignored.

For now, we need to be aware that a process that calls wait or waitpid can:

Block, if all of its children are still running
Return immediately with the termination status of a child, if a child has terminated and is waiting for its termination status to be fetched
Return immediately with an error, if it doesn’t have any child processes

If the process is calling wait because it received the SIGCHLD signal, we expect wait to return immediately. But if we call it at any random point in time, it can block.

Automata, language, and computational complexity

2018-12-24

参考课程内容 https://courses.engr.illinois.edu/cs373/sp2013/Lectures/

Closure properties of regular language & regular expression

Regex to nfa translation

Dfa minimization algorithm

Minimization of DFA - Table Filling Method (Myhill-Nerode Theorem)

Myhill-nerode theorem

Further more there is a minimized dfa having precisely one state for each equivalence class!!!
equivalence classes classification: how to tell what equivalence classes are there?
https://courses.cs.washington.edu/courses/cse322/05wi/handouts/MyhillNerode.pdf
Regular: we can use above algorithm on DFA minimization to find out each equivalence classes
Nonregular:

Pumping lemma prove nonregular language

Testing regularity finite language is regular infinite language can be tested by pumping lemma

Testing whether a language is regular or not

https://www.youtube.com/watch?v=KSczX111n3U

CFG DESIGN Regular language to CFG,

对每一个dfa的state设计一个nonterminal variable；
对于每个transition add 相应的rule （根据不同的input）
对于accepting state的nonterminal variable 让他生成空的string
This cfg generates the same language that dfa recognizes.

AMBIGUITY

A string is derived ambiguously if it has one or more left/right most derivation tree.
A Grammar G is ambiguous if it generates some string ambiguously.

Chompsky Normal Form

(CFG simplification: removal of null ε production + unit production elimination + adding auxiliary variable & rewrite)
https://www.youtube.com/watch?v=EF09zxzpVbk&list=PLBlnK6fEyqRgp46KUv4ZY69yXmpwKOIev&t=5s&index=76

For each direct null production A-> null , look for all productions whose right hand side contains A. Replace each occurrences of A in these production with null. By each occurences, it means all possible location of A that can be nullified. Finsh the resulting productions without A->null, looking for the next one for ex B->null until no null productions exist.

Or a graph-algorithm:
Create a directed cyclic graph where each vertex represent a nonterminal for all nonterminals.
Add an edge if there is a unit production.
Collapse all strongly connected components into a single vertex (meaning these nonterminal are equivalent, merge their production rules in the grammar)
Start from sink S, augment production rules of the sink to all nonterminals that go to sink, then remove the edges from all nonterminals that go to sink. Repeat this until no edges exist

CYK algorithm O(n^3)
(Dynamic Programming parsing/recognizing if a string is in the language of cfg)
Assume in put is in CNF form.

For example, to parse “id ( id , id )” 6 character sequence, we construct 6 by 6 grid with diagonals filled in all possible nonterminal that can derive the i-th symbol for i=1 to i=6.

Then we fill in each successive diagonals bottom up:
For each square, consider ALL possible ordered pair of the nonterminals one from left one from below, for each pair, if it can be derived from a rule(notice we have cnf), we put the lhs nonterminal of that rule to that squares

If the starting nonterminal appear in the upper right corner of the grid we accept the string

Non determinstic Push down automata
Equivalence of cfg and nondeterminstic pda :
A language is generated by a cfg = it is recognized by a npda
https://people.eecs.berkeley.edu/~sseshia/172/lectures/Slides8.pdf

For example.

Non context-free language:

For any context free language L, there is a pumping length p such that for any string
z ∈ L of sufficient length p≥1 or more, there is a decomposition of string z into 5−PART uvwxy s.t.:

|vwx|≤p middle three≤p (not too far apart)

|vx|>0 pumping part of length nonegative

for any i≥0 uv^i wx^i y stays in L

Both CFG-PL AND REG-PL are necessary but not sufficient conditions for context-freeness/regularity.

pumping lemma proof idea: for a sufficiently long string and its parse tree, the longest path in the tree by pigenhole principle must have nonterminals that repeat. Therefore Pumping up or down vx, it will remains a valid parse tree

Language hierarchy & Decidable and recognizable language

A decision problem can be expressed as a language, which is a set
A language is called a Recursively enumerable set (language) [also recognizable, partially decidable, semidecidable, Turing-acceptable or Turing-recognizable] if it is a recursively enumerable subset in the set of all possible words over the alphabet of the language, i.e., if there exists a Turing machine which will enumerate all valid strings of the language.

Recursive set, decidable set, computable set refers to the set of decision problem that are decidable in polynomial time

A complete set is a set of problems that are “hard”:
Np hard: every problem in np can reduce to it
Ex. To prove np-complete first show it is np, then prove it is np hard, namely choose a np complete problem and reduce to it

https://en.wikipedia.org/wiki/Complete_(complexity)

Proving problems undecidable
Turing machine and halting problem
Refers to handouts

多带图灵机与单带图灵机等价
A turing machine accepts input if there is a sequence of configuration from starting config to accepting config
A turing machine is decidable if there is some decider for it(always halt and acc or rej).
Every NDTM has a equivalent DTM
ATM, HaltTM, Etm, Eqtm, PCP is undecidable but Turing-recognizable/recursively enumerable

Undecidability and Rice’s theorem
Rice’s theorem:

A property is a set of Turing machine. A TM satisfies P means TM is in the set.
A property P is nontrivial means there is a TM that satisfies it and there is a TM that does not satisfies it.
A property P is trivial means either it is satisfied by all TMs or it is satisfied by no TMs.

Any nontrivial property about the r.e. language recognized by a Turing machine is undecidable.

Reduction https://en.wikipedia.org/wiki/Reduction_(complexity)
A reduces to B means we can use an efficient solution of B as a subroutine to solve A hardness of B 大于等于 A
Computational complexity
https://blog.csdn.net/golden1314521/article/details/51470999

NP completeness theory is designed to be applied only to decision problem.
解决一个decision problem 等价于recognizing a corresponding language
A language is a set that contains all strings accepted by the turing machine.
Class P：以单带图灵机定义的【所有多项式时间内deterministically recognizable language】
Class NP: NTM Augmented Model of one tape turing machine with guessing ability
NTM 定义了 class NP: All languages recognizable nondeterministically in Polynomial time / there is a polynomial time NDTM program that accepts the language
NP is the class of languages that have polynomial time verifiers for the membership of an arbitrary instance given.

A p-time mapping reduce to B means: there is a transformation function that given an instance a of A 包装成 instance of B f(a)=b in polynomial time such that membership of a and b in A and B. resp 同进同出

多项式时间于伪多项式时间
动态规划解决subsetsum

Textbook Reference

Computers and Intractability A Guide to the Theory of NP Completeness - Michael R. Garey and David S. Johnson
Introduction to the theory of computation third edition - Michael Sipser
Introduction To Automata Theory, Languages, And Computation 3rd - John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman

Workload A	Workload B

Threaded Binary Tree and Morris Traversal

Motivation

Transform a normal binary tree to TBT

CS564 - DBMS - Relational Design Theory

SQL

Relational algebra and Calculus

Keys

Join

SQL Language

Database Design (ER Model)

Terms

The Relational Model

Relational Design Theory

函数依赖 Functional Dependency

数据异常 Data Anomalies

范式 Normal Form

分解

1. 第一范式 (1NF) Atomicity(原子性)

2. Boyce–Codd normal form(BCNF) 每个表中只有一个候选键

3. 第三范式 (3NF) 2NF&消除传递函数依赖

Multi-dimensional database

Levels of abstraction in DBMS

Storage and File Structure

Storage Overview

Memory hierarchy

Disk

Database Buffer

File Organization

Storage summary

Indexing and Hashing

B+ tree

HASHING

Static Hashing

Dynamic Hashing

BITMAP

储存与索引总结

CS537 - Operating System Summary Part 1 Virtualization

Virtualization

Process

What is a process

Thread

Why do we need processes?

OS Scheduler

Goals for CPU Virtualization

Mechanism

System call

Dispatch mechanism

Policy

Vocabulary

FIFO (First In, First Out)

SJF (Shortest job first)

STCF (Shortest Time-to-Completion First)

Round Robin

I/O Aware Scheduling

Multilevel Feedback Queue

Memory Virtualization

Introduction

Goals

Address space

Memory Access Example

Basic Mechanisms

Time Sharing

Static Relocation

Dynamic Relocation: Introduction

Dynamic Relocation: Base Register

Dynamic Relocation: Base + Bounds

Segmentation

Summary

Paging

Fragmentation

Introduction for Paging

Translation of Page Addresses

Pagetables

Memory Access with Paging

Advantages of Paging

Disadvantages of Paging

Paging Translation Steps

TLB

Motivative Example: Iterating Array

Introduction