What is a Ph.D. Dissertation?
[I wrote this in 1993 as a letter to a student concerning a draft of his dissertation. in 2003 I edited it to remove some specific references to the student and present it as a small increment to the information available to my grad students. --spaf]
Let me start by reviewing some things that may seem obvious:
- Your dissertation is part of the requirements for a PhD. The research, theory, experimentation, et al. also contribute. One does not attempt to capture everything in one's dissertation.
- The dissertation is a technical work used to document and set forth proof of one's thesis. It is intended for a technical audience, and it must be clear and complete, but not necessarily exhaustively comprehensive. Also note -- experimental data, if used, is not the proof -- it is evidence. The proof is presented as analysis and critical presentation. As a general rule, every statement in your dissertation must be common knowledge, supported by citation to technical literature, or else original results proved by the candidate (you). Each of those statements must directly relate to the proof of the thesis or else they are not needed.
- The dissertation is not the thesis. One's thesis is a claim -- a hypothesis. The dissertation describes, in detail, how one proves the hypothesis (or, rarely, disproves the claim and shows other important results).
Let's revisit the idea of the thesis itself. It is a hypothesis, a conjecture, a theorem. The dissertation is a formal, stylized document used to argue your thesis. The thesis must be significant, original (no one has yet demonstrated it to be true), and it must extend the state of scientific knowledge.
The first thing you need to do is to come up with no more than three sentences that express your thesis. Your committee must agree that your statements form a valid thesis statement. You too must be happy with the statement -- it should be what you will tell anyone if they ask you what your thesis is (few people will want to hear an hour presentation as a response).
Once you have a statement of thesis, you can begin to develop the dissertation. The abstract, for instance, should be a one-page description of your thesis and how you present the proof of it. The abstract should summarize the results of the thesis and should stress the contributions to science made thereby.
Perhaps the best way to understand how an abstract should look would be to examine the abstracts of several dozen dissertations that have already been accepted. Our university library has a collection of them. This is a good approach to see how an entire dissertation is structured and presented. MIT press has published the ACM doctoral dissertation award series for over a decade, so you may find some of those to be good examples to read -- they should be in any large technical library.
The dissertation itself should be structured into 4 to 6 chapters. The following is one commonly-used structure:
- Introduction. Cover an introduction to the basic terminology, give citations to appropriate background work, briefly discuss related work that has already covered aspects of the problem.
- Abstract model. Discuss an abstract model of what you are trying to prove. This chapter should not discuss any specific implementation (see below)
- Validation of model/proof of theorems. This is a chapter showing a proof of the model. This could be a set of proofs, or a discussion of construction and validation of a model or simulation to be used in gathering supporting data.
- Measurements/data. This would be a presentation of various data collected from real use, from simulations, or from other sources. The presentation would include analysis to show support for the underlying thesis.
- Additional results. In some work there may be secondary confirmation studies, or it might be the case that additional important results are collected along the way to the proof of the central thesis. These would be presented here.
- Conclusions and future work. This is where the results are all tied together and presented. Limitations, restrictions and special cases should be clearly stated here along with the results. Some clear extensions to future work may also be described.
Let's look at these in a little more detail
Chapter I, Introduction. Here, you should clearly state the thesis and its importance. This is also where you give definitions of terms and other concepts used elsewhere. There is no need to write 80 pages of background on your topic here. Instead, you can cover almost everything by saying: "The terminology used in this work matches the definitions given in [citation, citation] unless noted otherwise." Then, cite some appropriate works that give the definitions you need. The progress of science is that we learn and use the work of others (with appropriate credit). Assume you have a technically literate readership familiar with (or able to find) common references. Do not reference popular literature or WWW sites if you can help it (this is a matter of style more than anything else -- you want to reference articles in refereed conferences and journals, if possible, or in other theses).
Also in the introduction, you want to survey any related work that attempted something similar to your own, or that has a significant supporting role in your research. This should refer only to published references. You cite the work in the references, not the researchers themselves. E.g., "The experiments described in [citation] explored the foo and bar conditions, but did not discuss the further problem of baz, the central point of this work." You should not make references such as this: "Curly, Moe and Larry all believed the same in their research [CML53]" because you do not know what they actually believed or thought -- you only know what the paper states. Every factual statement you make must have a specific citation tied to it in this chapter, or else it must be common knowledge (don't rely on this too much).
Chapter II. Abstract Model. Your results are to be of lasting value. Thus, the model you develop and write about (and indeed, that you defend) should be one that has lasting value. Thus, you should discuss a model that is not based on Windows, Linux, Ethernet, PCMIA, or any other specific technology. It should be generic in nature, and should capture all the details necessary to overlay the model on likely environments. You should discuss the problems, parameters, requirements, necessary and sufficient conditions, and other factors here. Consider that 20 years ago (ca 1980) the common platform was a Vax computer running VMS or a PDP-11 running Unix version 6, yet well-crafted theses of the time are still valuable today. Will your dissertation be valuable 20 years from now (ca 2020), or have you referred to technologies that will be of only historical interest?
This model is tough to construct, but is really the heart of the scientific part of your work. This is the lasting part of the contribution, and this is what someone might cite 50 years from now when we are all using MS Linux XXXXP on computers embedded in our wrists with subspace network links!
Chapters III & IV, Proof. There are basically three proof techniques that I have seen used in a computing dissertation, depending on the thesis topic. The first is analytic, where one takes the model or formulae and shows, using formal manipulations, that the model is sound and complete. A second proof method is stochastic, using some form of statistical methods and measurements to show that something is true in the anticipated cases.
Using the third method, you need to show that your thesis is true by building something according to your model and showing that it behaves as you claim it will. This involves clearly showing how your implementation model matches the conditions of your abstract model, describing all the variables and why you set them as you do, accounting for confounding factors, and showing the results. You must be careful to not expend too much effort describing how standard protocols and hardware work (use citations to the literature, instead). You must clearly express the mapping of model to experiment, and the definition of parameters used and measured.
Chapter V. Additional results. This may be folded into Chapter III in some theses, or it may be multiple chapters in a thesis with many parts (as in a theory-based thesis). This may be where you discuss the effects of technology change on your results. This is also a place where you may wish to point out significant results that you obtained while seeking to prove your central thesis, but which are not themselves supportive of the thesis. Often, such additional results are published in a separate paper.
Chapter VI. Conclusions and Future work. This is where you discuss what you found from your work, incidental ideas and results that were not central to your thesis but of value nonetheless, (if you did not have them in Chapter V) and other results. This chapter should summarize all the important results of the dissertation --- note that this is the only chapter many people will ever read, so it should convey all the important results.
This is also where you should outline some possible future work that can be done in the area. What are some open problems? What are some new problems? What are some significant variations open to future inquiry?
Appendices. Appendices usually are present to hold mundane details that are not published elsewhere, but which are critical to the development of your dissertation. This includes tables of measurement results, configuration details of experimental testbeds, limited source code listings of critical routines or algorithms, etc. It is not appropriate to include lists of readings by topic, lists of commercial systems, or other material that does not directly support the proof of your thesis.
Here are some more general hints to keep in mind as you write/edit:
- Adverbs should generally not be used -- instead, use something precise. For example, do not say that something "happens quickly." How fast is quickly? Is it relative to CPU speeds? Network speeds? Does it depend on connectivity, configuration, programming language, OS release, etc? What is the standard deviation?
- As per the above, use of the words "fast", "slow", "perfect", "soon", "ideal", "lots of" and related should all be avoided. So should "clearly", "obviously", "simple", "like", "few", "most", "large", et al.
- What you are writing is scientific fact. Judgments of aesthetics, ethics, personal preference, and the like should be in the conclusions chapter if they should be anywhere at all. With that in mind, avoid use of words such as "good", "bad", "best", and any similar discussion. Also avoid stating "In fact," "Actually," "In reality," and any similar construct -- everything you are writing must be factual, so there is no need to state such things. If you feel compelled to use one of these constructs, then carefully evaluate what you are saying to be certain you are not injecting relative terms, opinions, value judgements, or other items that are inappropriate for a dissertation.
- Computers and networks do not have knees, so poor performance cannot bring them to something they do not have. They also don't have hands, so "On the one hand..." is not good usage. Programs don't perform conscious thought (nor do their underlying computers), so your system does not "think" that it has seen a particular type of traffic. Generalizing from this, do not anthropomorphize your IT components!
- Avoid mention of time and environment. "Today's computers" are antiques far sooner than you think. Your thesis should still be true many years from now. If a particular time or interval is important, then be explicit about it, as in "Between 1905 and 1920" rather than "Over the last 15 years." (See the difference, given some distance in time?)
- Be sure that something you claim as a proof would be recognized as such by any scientist or mathematician.
- You and your dissertation are supposed to be the ultimate (current) authority on the topic you are covering. Thus, there should be no instance of "to the best of our knowledge" or "as far as we can tell." Either you know for certain, or you don't -- and if you don't know, you shouldn't state it!
- Focus on the results and not the methodology. Methodology should be clearly described, but not the central topic of your discussion in chapters III & IV
- Keep concepts and instances separate. An algorithm is not the same as a program that implements it. A protocol is not the same as the realization of it, a reference model is not the same as a working example, and so on.
As a rule of thumb, a CS dissertation should probably be longer than 100 pages, but less than 160. Anything outside of that range should be carefully examined with the above points in mind.
Keep in mind that you -- the Ph.D. candidate -- are expected to become the world's foremost expert on your topic area. That topic area should not be unduly broad, but must be big enough to be meaningful. Your advisor and committee members are not supposed to know more about the topic than you do -- not individually, at least. Your dissertation is supposed to explain your findings and, along with the defense, demonstrate your mastery of the area in which you are now the leading expert. That does not mean writing everything you know -- it means writing enough about the most important points that others can agree with your conclusions.
Last of all, don't fall into the trap that ties up many a candidate, and causes some of them to flame out before completion: your thesis does not need to be revolutionary. It simply needs to be an incremental advancement in the field. Few Ph.D. dissertations have ever had a marked impact on the field. Instead, it is the set of publications and products of the author that may change the field.
If your dissertation is like most, it will only be read by your committee and some other Ph.D. candidates seeking to build on your work. As such, it does not need to be a masterwork of literature, nor does it need to solve a long-standing problem in computing. It merely needs to be correct, to be significant in the judgement of your committee, and it needs to be complete. We will all applaud when you change the world after graduation. And at that you will find that many well-known scientists in CS have made their careers in areas different from their dissertation topic. The dissertation is proof that you can find and present original results; your career and life after graduation will demonstrate the other concerns you might have about making an impact.
So get to work!
Everything starts here.
You will want to identify a real problem in society that leads you to want to conduct your dissertation research. There are many possible categories of problems, but the simplest way to look at problems is: does the problem cause pain and suffering? Or is the problem related to monetary issues such as loss or excess expenditures. Many students claim that the problem is that there is not enough research on their topic. This can be claimed about almost any topic. Yes, we could use additional research on some aspect of society.
Who will care about the problem?
When your problem statement is vague or unrealistic, it is very difficult to get your chairperson and committee members interested enough to care about our dissertation research. You will conduct legitimate research as you work to complete your dissertation. Everyone involved wants to feel that what you are doing is meaningful and it is. Therefore, you want to look at society for difficulties, concerns, select groups of people, and situations that are not going quite right. You have your own interests and causes you believe in. If possible you can identify a problem surrounding those things you care about. If you care deeply about your research problem, it will be easier handle the setbacks and challenges all dissertation students face along the road to completion.
You may want to consider opening this section with these words, “This study addresses the problem of…” You want the Problem Statement to be in your own words without citations. After claiming what the problem is, you are expected to later validate and prove that the problem exists from other studies, literature, and/or data from various sources such as governmental agencies.
Two examples of problem statements from dissertations:
“This study addresses the problem of: Mexican-American students attaining their doctoral degrees in alarmingly low numbers.” Emilio Rendon, Ph.D. (1999). Main factors that influence the attainment of the doctoral degree by Mexican-Americans. Texas A & M University.
“The study addressed the problem of high relapse rates among adult alcoholics.” William V. Plath, Ph.D. (2001). The rational integration approach for reducing adult alcoholics’ alcohol relapse. Walden University.