Economics
Optional Problem Set #2 (Due: November 8, 2018)
This problem set introduces you to Stata for hypothesis testing and regression in Stata. For an
introduction to Stata, see Professor Wooldridge’s 35-minute online video tutorial, “How to Use
Stata,” in the “EC 420 Course Level Link” of D2L. For a useful summary of Stata commands, see
Professor Wooldridge’s “Rudiments of Stata,” posted in the <Stata stuff> folder on D2L.
Instructions
• Use Stata and a word processor for this assignment.
• For each question that asks you to use Stata, copy and paste the Stata output into a wordprocessing document, then type your answer. Staple all pages together at the upper-left
corner before you turn in your homework.
• Assignments turned in unstapled will be returned with a grade of zero.
• Only stapling is acceptable—paper clips and other methods of binding are not acceptable.
• If we cannot discern the meaning of your work, your response will be scored as incorrect.
Copying Stata output into your document
There are at least three ways to copy your Stata output into your document:
• Copy the text from Stata into your document, then edit the text to Courier 10 point
• On a Mac: Print the Stata output to pdf (“open PDF in Preview”), then use rectangular
selection (in the Tools menu) to copy and paste selected output into your document
• On a Windows machine: Use the “copy as picture” option
General tips
• Read the question and answer what is asked.
• Be careful with language—for example, in #3 below, say, “Black men in this sample earn less
than non-black men on average.”
• Do not infer causality from a correlation coefficient or a list of averages by education level.
The problem set uses the Stata dataset HTV.dta that can be downloaded from the D2L website.
This dataset includes data on 1,230 men aged 26–34 who were interviewed in 1991 for the
National Longitudinal Survey of Youth. You will be using the the following variables:
lwage natural logarithm of the person’s hourly wage rate in 1991
educ highest grade level completed by 1991
abil score from an ability test administered in 1979
ne = 1 if the person lived in one of the northeastern states
nc = 1 if the person lived in one of the north central statesPart 1: Hypotheses testing and simple regression in Stata (15 points total)
1. (5 points) The variable lwage is the natural log of the hourly wage rate. What is the sample
mean log wage (lwage) for workers in the northeast? for workers who lived outside the
northeast?
Hint: Use the Stata command tab var1, sum(var2). In this case, var1 is ne and var2 is lwage. (Show
your Stata output.)
2. (5 points) Test the hypothesis that the population mean log wage is equal for workers in the
northeast and for workers who lived outside the northeast. Using a 5% (0.05) significance level
or a 95% confidence interval, what do you conclude and why?
Hints: Follow the steps in <hyp testing-mean.pdf> in the “Cookbooks & crib sheets” folder on
D2L. Use the ttest command in Stata: ttest var1, by(var2). In this case, var1 is lwage and var2 is
ne. (Show your Stata output.)
3. (5 points) Estimate a simple linear regression model relating the log of the hourly wage rate
(lwage) to location in the northeast (ne):
lwage= β0 + β1 ne + u
What is the interpretation of the estimated coefficient on ne (β1)?
Hint: Use the reg command in Stata: reg lwage ne. (Show your Stata output.)
Part 2: More simple linear regression in Stata (25 points total)
1. (5 points) Consider a simple linear regression model relating the log of the hourly wage rate
(lwage) to years of education (educ):
lwage= β0 + β1 educ + u
What is the meaning of the error term u?(No Stata output needed.)
2. (5 points) Give an example of a variable (or factor) that might be contained in u. (No Stata
output needed.)
3. (5 points) What is the key condition for β1 to be interpreted as the causal effect of an
additional year of education on lwage? Does this condition hold? (No Stata output needed.)
4. (5 points) Use the data in <HTV.dta> to estimate the simple regression model in question 1
and paste the results into your document.
Hint: Use the reg command in Stata: reg lwage educ. (Show your Stata output.)
5. (5 points) What is the R2 of the regression and how do you interpret it? What does it tell you
about the extent to which wages are causally affected by education? (No Stata output needed.)
(continued)
2 of 3Part 3: Multiple regression in Stata (30 points total)
1. (5 points) Estimate the multiple linear regression model:
lwage= β0 + β1 educ + β2 ne + u
Hint: Use the reg command in Stata: reg lwage educ ne. (Show your Stata output.)
2. (5 points) What is the OLS estimate of the slope on education and how do you interpret it?
(No need to comment on the intercept or the coefficient on ne.)
3. (5 points) What is the estimated intercept? What is its interpretation? (No additional Stata
output needed.)
4. (5 points) Why is the estimate of β1 in the multiple regression you just estimated smaller than
the estimate from the the simple regression of lwage on educ?
Hint: Think about the OVB formula and consider two things: First, are lwage and ne related in this
model (and what is the sign of the estimated slope coefficient on ne)? Second, how are the
variables ne and educ related—that is, do people in the northeast have more or less education
than others in the sample, on average? (No additional Stata output needed.)
5. (5 points) Use the output from your regression to test the hypothesis that education is
unrelated to log wage. State the null and the alternative hypotheses in terms of the notation
used in class, then state the test statistic you use. Using a significance level of 1%, (and a 99%
confidence interval), what do you conclude and why? Explain your reasoning. (No additional
Stata output needed.)
Hint: Follow the steps in <hyp testing-reg.pdf> in the “Cookbooks & crib sheets” folder on D2L.
6. (5 points) What is the predicted log wage of a worker with 12 years of education lives in the
northeast? What is the predicted log wage of a worker with 16 years of education who does
not live in the northeast?