1 Basic Description
Objective:
The purpose of this class is to apply the analytical and quantitative skills which should
be acquired in this course. The final project should be a professional looking manuscript
with easily interpreted graphics and charts.
Description:
The project requires using statistical techniques learned in this course to analyze data
from the 1997 National Longitudinal Survey of Youth. The specific topic that you examine is up to you.
Formatting:
All aspects of your final paper must be typed. The paper should be double spaced with
font of at least 11 point. There is no minimum number of pages. There is a strict
maximum of 10 pages including title page, tables, graphs, and appendices.
Anything beyond 10 pages will not be graded. The paper should include the following
sections: introduction, data, empirical methodology, results, and conclusions.
Data:
You will use the data set nlsy97 bigf18.dta found on Carmen. The data is a subsample
of the 1997 NLSY. The data set is funded by the BLS. They describe the data as follows:
The NLSY97 consists of a nationally representative sample of approximately 9,000
youths who were 12 to 16 years old as of December 31, 1996. Round 1 of the survey
took place in 1997. In that round, both the eligible youth and one of that youth’s parents received hour-long personal interviews. In addition, during the screening process,
an extensive two-part questionnaire was administered that listed and gathered demographic information on members of the youth’s household and on his or her immediate
family members living elsewhere. Youths are interviewed on an annual basis.
To avoid the complexity of working with panel data, the time aspect is taken out. There
are literally thousands of data points available, but I have narrowed it to the variables
shown in table 1.
2 Variable definitions
Table 1: Variable Definitions
Variable Name | Description |
grades8 gradeshs health 1997 icecream 1997 hardtimes 1997 census region 1997 income gross yr 1997 msa 1997 urban rural 1997 race ethnicity asvab marstat 2013 job income weight health 2013 momsedu grade degree height age |
Grades in 8th grade Grades in High School Self Reported Health as an adolescent Favorite Ice Cream Flavor as an adolescent Whether your family experienced “Hard Times” as a child” The census region resided in as an adolescent Household income of family when respondent was an adolescent MSA of residence as an adolescent Urban or rural residency as an adolescent Race and Ethnicity Armed Services Vocational Aptitude Battery percentile Marital and cohabitation status Have a job? Current (2013) income of the respondent Weight (pounds) Current (2013) self reported health Highest level of education achieved by mother Highest Grade Completed Highest Degree Obtained Height in Inches Age in 2013 |
Many of the variables are categorical and many that seem ordinal are not. Simply type
“tab variable” and stata will display how many observations belong to each category
and what those categories indicate.
3 Working with a partner
You may work with a partner on the project. When the paper is uploaded on Carmen
both names must appear on the paper. You also must join a group under the people
tab on Carmen.
4 Tables and Figures
As described in the rubric, there are tables and figures required in the data and results
sections. All tables and figures should follow the same general format:
• All objects should be labeled, numbered, and titled. For example: Table 1 Summary Statistics.
• All objects should provide insight into your paper. If your paper is about gender
wage differences, then the figure that you include should be some sort of graphic
that shows some detailed differences in wages by gender.
• Summary statistics tables should include means and standard deviations.
• Regression results tables should include standard errors and stars for 1%, 5%, and
10% significance levels.
• Similar regressions should all be in the same table.
• Summary statistics tables should only include variables used in your analysis.
• Categorical variables that are non-ordinal must be broken up into dummies in the
summary statistics tables.
• Ordinal categorical variables should have the categories described in the or table
footnotes. For example, mother’s education could have a mean of 3.8. Explain
that.
• The main idea of a table or figure should be plainly understood without having
to read the text.
5 Originality
All papers are turned in through Carmen using the turnitin app. I expect that the
similarity scores will not be 0% since everyone is using the same data set and the data
descriptions will be similar. A similarity score above 20% is problematic. If the paper
shows up as not searchable, it will not be accepted.
6 Commonly missed Points
Grammar Do not simply write your paper in another language and use google translate
to change it to English. Those papers are nearly impossible to read. Reading the paper
out loud to yourself before you hand it in will help tremendously.
Following Directions Include everything that is on the attached grading rubric and
in the correct section.
Potential Problems with your regression All methodologies have some potential
problems. When you address those problems in the methodology section do not include
problems that you can fix with your current skills and data, specifically discuss which
coefficients could be affected and the direction of the affect.
Equation The equation that is written out in the empirical methodology section must
be the same equation used to generate the tables. Write out β1 rather than B1 or β1.
If the equation has perfect multicollinearity or treats non-ordinal categorical variables
as continuous, there will be large deductions.
Tables and graphs checklist Missing any part of the checklist will result in significant
deductions.
• Tables and graphs must be understandable without reading the text. This means
there should be a clear title that explains the contents of the table, such as “Summary Statistics”.
• Do not use variable names in tables and figures.For example, rather than momedu
my table should say “Mother’s Education”.
• There should also be a footnote which adds some detail to the and explains what
is shown. For example “Each entry is an OLS coefficient with standard errors in
parentheses. Stars represent p-values as …”
• If plotting distribution in a figure, use percent as the y-axis.
• All figures and tables should be referenced in the text.
• Figures must show more than just means.
• Cut and pasting from the stata output window will get 0 points. Use the estout
program to create the tables or type them in a word processing program.
A condensed version of this form will be used to grade papers.
1. Formatting 5pts: The paper conforms to the formatting standards
outlined in the assignment: Double spaced, font, margins, less than 10 pages, and
labeling sections. The file is a .pdf. 1
2. Organization 5pts:
• Ideas are presented clearly, free of spelling and grammatical errors. Any
references are cited. (-1 for each instance2)
• The paper is not cut and pasted from homeworks and transitions well between sections. (5) (Not a list of tasks.3)
3. Introduction 5pts: The question to be analyzed is described. The
importance of the question is discussed. The data source is mentioned. The
empirical methodology is mentioned. (OLS) A preview of results is given
1Negative scores are possible for this section.
2Up to 20 points can be deducted
3If your paper is just a list of tasks and does not resemble a research paper, you will be deducted
20 points.
4. Data Section 15pts:
• The source of the data is mentioned. There is a brief discussion of means,
standard deviations, etc. of the wage variable and whatever other variable(s)
on which you choose to focus. (5)
• A summary table with means and standard deviations which is referenced in
the text. All variables included in your regression must be in the summary
statistics and categories such as race must be broken into dummies. (5)
• At least one graphic (histogram(don’t use density), scatter plot, etc.) which
describes data and is referenced in the text. The graphic must be relevant
to your topic. (5)
5. Empirical Methodology 15pts
• The question being analyzed is clearly described and the estimated equation
is written out. (5)
• The inclusion of each variable is supported by theoretical reasoning and predicted signs of each variable are discussed. If your equation has perfect multicollinearity or uses non-ordinal categorical variables as continuous variables
you will lose all 5 of these points. (5)
• Potential concerns with sample selection, omitted variables, etc. are discussed along with the consequences of those problems. Discuss specifically
which coefficients are affected and how. (5)
6. Results 30pts
• Results table. Must be clean, understandable without reading text. (Cut
and paste from STATA output is a zero.) (7)
• Interpretation of coefficients (8) (You must interpret at least 4 coefficients
including 1 dummy variable and 1 coefficient with a logged dependent variable.)
• Hypothesis testing: discuss which coefficients are statistically significant at
a 5% significance level; discuss the R2 of the regression; conduct an F-test
on some aspect of the regression results. (5)
• Use your estimates to predict the outcomes of two hypothetical people. (Unless you understand how to calculate predicted values with a logged dependent variable, use a level-level for this part.) (3)
• Discuss the consistency of your results with your predictions. (2)
• Estimate an alternative specification, present (in a table) and discuss. (5)
7. Conclusion 5ptsThe conclusion summarizes results and the significance of those results.
8. Overall Quality 10 pts This is the only subjective portion of your
grade. The average score will be 6/10.
9. Overall Difficulty 10 pts Difficulty points are granted for the relative econometric rigor of your paper. This may include working without a partner, creating more complex graphics, using interaction terms, developing a unique
question, running additional regressions, citations and descriptions of relative economic literature (must be available on EconLit) etc.