Econometrics
Data Project
Submit your project in a Word document in Blackboard. If you use Pages with a Mac, please export your pages document into a PDF. Only Word or PDF will be accepted.
This project MUST be done individually and independently. Identical or substantially similar work will result in an F for all authors involved.
This project is designed to give you a flavor of how quantitative research is conducted in the real world, and how some of the econometric techniques we discussed in class are applied. There are several statistical packages out there, and you are required to conduct this project using R/RStudio.
This project will involve the following tasks:
Where to Get R and RStudio?
Download R at http://www.r-project.org/, and install R on your computer. After installing R, download RStudio at http://www.rstudio.com/, and install RStudio on your computer.
Where to Get Data?
Go to the CDC (Centers for Disease Control and Prevention) homepage at https://www.cdc.gov/. Scroll down, and click on “CDC Organization”. On the CDC organization page, click on “National Center for Health Statistics” (NCHS). Next, under “Population Surveys”, click on “National Health and Nutrition Examination Survey” (NHANES). On the NHANES page, click on “Questionnaires, Datasets, and Related Documentation”, then select the “HNANES 2017-2018” data. There, you will see 5 categories of data that you have access to: Demographics Date, Dietary Data, Examination Date, Laboratory Data, and Questionnaire Data. Each category has multiple datasets (except for Demographics Data). You are required to use at least TWO datasets. I do suggest that you include Demographics Data where you can find most of the common demographics variables such as age, education, income, etc.
How to Download the Datasets in SAS Transport (XPT) Files?
Right click on the XPT links to the right of the datasets that you choose, then select “save link as”, and save them on your flash drive or C: drive. You can rename them as you like.
How to Load the SAS XPT Files into Stata?
Hint: You will need the package “Hmisc”. I have demonstrated in class quite a few times how to install packages. After installing and calling the package, you will need the command “sasxport.get” to open the XPT files.
How to Merge the Datasets?
Each individual in the datasets has an ID number called “seqn” (you can see it in the Variable Lists), and you will merge the datasets by this common variable. Hint: use the “merge” command. Find out by yourself how this command works.
Other.
There may be missing/empty values (shown as NA) or values that do not make sense (for instance 99) of some variables for some observations/individuals. When you do the summary statistics, you will need to deal with them appropriately (for example, you should change those strange values to NA, and tell RStudio to ignore the NA when calculating).