## Grad School (the good parts)

##### On Abstractions

“Systems designers are abstraction merchants.”1

“Nothing is so difficult it cannot be solved by another level of indirection.”2

##### On Files

The life of an average file is tedious3 and brief4. Sequential access is rarely sequential with multiple threads.

##### On Protocols

Even if a protocol seems great on paper, it may not be used for lots of reasons.5

##### On Queueing

$P_k$ (number in system at time k), $\bar{N}$ (mean number in system), and $\bar{T}$ (mean time in system) are independent of service disciplines, but the variance and distribution of T are not. I.e., $T_{FCFS} = T_{LCFS}$, but $var_{T,FCFS} \neq var_{T,LCFS}$ since $var = E[T^2] - E^2[T]$.

Poisson only works if your events are truly independent.6

Nothing cannot be fixed with the words “assume i.i.d.”

##### On optimization

“Put broadly, the object of study in mathematics is truth; the object of study in computer science is complexity. [I]t’s not enough for a problem to have a solution, if that problem is intractable.”7

Optimization works in cycles:

1. Exploit assumptions and amortization.
2. Divide and conquer until re-computation takes too long.
3. Parallize until Amdahl’s.
4. Dynamic programming (parenthesiz-ation , memo-ization) until state space complexity takes you back to #1.
##### On Distributed Systems

In distributed systems, it is impossible to tell whether a system is dead or arbitrarily delayed.

##### On Data

Curse of Multidimensionality: as the number of attributes/dimensions in a dataset increases, the average distance becomes larger, making it more difficult to detect outliers8 and leading to overfitted models.9

The more you look, the more you overfit10.

Tricks in Data Analysis:

1. Add more dimensions: Non-linear SVM
2. Take away dimensions: Random forest, eigenvectors

Data-ink ratio = $\frac{data\ ink}{total\ ink}$

##### On Databases

Everyone is afraid to touch optimizers, because no one knows how they work.

##### On Tools

Typing skill and comprehension are independent. Concurrent tasking does not affect typing much.11

$% {smooth}>> CDF;\\ @AA{\sum}A @A{\int_0^x}AA \\ PMF @<{bin}<< PDF; \end{CD} %]]>$ 12

“A long list is no list”.13

“When in doubt, draw it out.”14

##### On Process

After accounting for size, other code complexity metrics become noise.15

In an sufficiently regulated engineering process, the non-work deliverables themselves take on technical rigor with little connection to the product.

“The second is that people tend to inconsistency. The prediction is that methodologies requiring disciplined consistency are fragile in practice… The fourth is that people like to be good citizens, are good at looking around and taking initiative. These combine to form that common success factor, «a few good people stepped in at key moments.»”16

##### On Human Condition

Human reliability is a log normal distribution.17

“The only risks in life are the people and things you depend on.”13 18

##### On Psychophysics

“Sound exists in time and over space, vision exists in space and over time.” 19

1. An indirect quote of David Wheeler, later institutionalized as the fundamental theoreum of software engineering (FTSE) and RFC1925. E.g., indirect block addressing

2. From “A File Is Not A File”; it is actually many many files.

3. From “Measurements of a Distributed File System”; most are small (~50% <1KB) and short lived (74% open for less than .1 second, 50% live ≤1s, ~97% live less than a day).

4. CSMA is more efficient at lower packet sizes, and the 802.11 setting for RTS/CTS/Thres[hold] will revert to CSMA below a a threshold size. But computing the optimum size is hard, and usually RTS/CTS is disabled.

5. From “Wide Area Traffic: The Failure of Poisson Modeling”; TCP traffic, or IP over AAL5 are rarely poisson.

6. On anomaly detection: The Nimbus 7 satellite had anomaly detection software which was dropping valid measurement of ozone depletion over Antarctica.

7. “87% of the population in the United States had reported characteristics that likely made them unique based only on {5-digit ZIP, gender, date of birth}” Sweeney 2000

8. Bonferroni’s Principle: “Who searches a lot, finds a lot.”

9. Think Stats, chapter 6.

10. Robert (Bob) Bell, personal conversation  2

11. M&R, chapter 6.

12. Khaled El Emam, Saida Benlarbi, Nishith Goel, and Shesh N. Rai: “The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics”. IEEE Transasctions on Software Engineering, 27(7), July 2001.

13. Willoughby templates, institutionalized as DoD 4245.7-M

14. Mountford, S.J., & Gaver, W.W. (1990). Talking and listening to computers. In B. Laurel (Ed.), The art of human-computer interface design (p. 322). Reading, Massachusetts: Addison-Wesley.