About CPS Data#
The python scripts used to create cps.csv.gz
can be found in taxdata/cps
.
To create this file your self, import and run the cps.create
function as
demonstrated below:
from taxdata import cps
raw_cps = cps.create(
datapath=DATA_PATH,
exportcsv=False,
exportpkl=True,
exportraw=False,
validate=False,
benefits=True,
verbose=True,
)
where DATA_PATH
is a path to the directory where you store the original CPS
files. If you are only interested in using the default settings, you can just
run createcps.py
.
By default, the CPS file will be composed of the 2013, 2014, and 2015 March CPS
Supplemental files. taxdata
also supports using the 2016, 2017, and 2018 files.
Support for additional files will be added as they become available.
To use a non-default set of files, add the cps_files
parameter to your function
call:
raw_cps = cps.create(
datapath=DATA_PATH,
exportcsv=False,
exportpkl=True,
exportraw=False,
validate=False,
benefits=True,
verbose=True,
cps_files=[2016, 2017, 2018]
)
Once the raw file has been created, you will need to run it through the
finalprep
function before it it ready to be used by Tax-Calculator.
final_cps = cps.finalprep(raw_cps)
final_cps.to_csv(final_output_path, index=False)
Input files:#
With the exception of the CPS March Supplements, all input files can be found
in the cps/data
directory.
CPS March Supplements#
asec2013_pubuse.dat
asec2014_pubuse_tax_fix_5x8_2017.dat
asec2015_pubuse.dat
asec2016_pubuse_v3.dat
asec2017_pubuse.dat
asec2018_pubuse.dat
C-TAM Benefit Imputations#
Note that we only have C-TAM imputations for the 2013, 2014, and 2015 files. For other years, we just use the benefit program information in the CPS
Housing_Imputation_logreg_2013.csv
Housing_Imputation_logreg_2014.csv
Housing_Imputation_logreg_2015.csv
medicaid2013.csv
medicaid2014.csv
medicaid2015.csv
medicare2013.csv
medicare2014.csv
medicare2015.csv
otherbenefitprograms.csv
SNAP_Imputation_2013.csv
SNAP_Imputation_2014.csv
SNAP_Imputation_2015.csv
SS_augmentation_2013.csv
SS_augmentation_2014.csv
SS_augmentation_2015.csv
SSI_Imputation2013.csv
SSI_Imputation2014.csv
SSI_Imputation2015.csv
TANF_Imputation_2013.csv
TANF_Imputation_2014.csv
TANF_Imputation_2015.csv
UI_imputation_logreg_2013.csv
UI_imputation_logreg_2014.csv
UI_imputation_logreg_2015.csv
VB_Imputation2013.csv
VB_Imputation2014.csv
VB_Imputation2015.csv
WIC_imputation_children_logreg_2013.csv
WIC_imputation_children_logreg_2014.csv
WIC_imputation_children_logreg_2015.csv
WIC_imputation_infants_logreg_2013.csv
WIC_imputation_infants_logreg_2014.csv
WIC_imputation_infants_logreg_2015.csv
WIC_imputation_women_logreg_2013.csv
WIC_imputation_women_logreg_2014.csv
WIC_imputation_women_logreg_2015.csv
Imputation Parameters#
These parameters are used in the imputations found in taxdata/cps/data/impute.py
logit_beta.csv
ols_betas.csv
Output Files#
Only cps.csv.gz
is included in the repository due to the size of cps_raw.csv.gz
.
cps.csv.gz
cps_raw.csv.gz
Documentation#
More information about the data in the cps_raw.csv.gz
file is
available in this document.
All of the benefit costs listed in benefitprograms.csv
can be found
in tables 3.2 and 11.3 of the archived Historical
tables
of the Office of Management and Budget. All costs are in millions of
dollars.