Migration metadata: 2016-2022
These files will be of most interest to researchers, IGs, and oversight officials.
When data was migrated from Census to GSA, it was modified to meet the validation requirements imposed by GSA. These files record the changes made during that process.
The process
The process of migrating data was developed in conjunction with OMB and Federal partners. Decisions regarding the process are captured through the FAC’s Decision Record process:
- https://github.com/GSA-TTS/FAC/blob/main/docs/architecture/decisions/0039-data-migration-iterative-approach.md
- https://github.com/GSA-TTS/FAC/blob/main/docs/architecture/decisions/0040-data-migration-implementing-logging-and-transformation-tracking.md
Census to GSA submission crosswalk
The data collected at Census and GSA is identified in different ways.
Data collected at Census had a compound identifier. It used the AUDITYEAR
of the submission along with a DBKEY
, which was a manually-assigned numerical value. The combination of the AUDITYEAR
and DBKEY
was meant to be a unique identifier for each SF-SAC and audit report submission.
Data collected at GSA uses an identifier that combines:
- the year and month of the fiscal year end date (e.g.
2023-09
), - a six-character string indicating who collected the audit (either
CENSUS
orGSAFAC
), and - A number. If the audit was collected by...
CENSUS
: the number is theDBKEY
GSAFAC
: the number is an auto-incrementing value with no meaning
We provide a CSV containing a crosswalk of data migrated from Census to GSA. This CSV has 8 columns:
- A row id
audit_year
of the submissiondbkey
of the submission- GSA-assigned
report_id
for the submission auditee_uei
, the SAM.gov-provided unique identifier (if it exists; only present from 2022 onward;GSA_MIGRATION
in many cases)auditee_ein
, the auditee's tax identification number (always present)run_date
, which is when the migration was runmigration_status
(alwaysSUCCESS
in this crosswalk)
There were two audits that were unable to be migrated. These are, for all intents and purposes, lost.
- AUDITYEAR: 2019, DBKEY: 243753
- AUDITYEAR: 2022, DBKEY: 247680
By audit year
For each audit year, there are two migration metadata tables:
- invalidauditrecord: records where audit data was incorrect or otherwise irrecoverably invalid in the migration process
- inspectionrecord: records values that were incorrect or incompatible with modern data validations, and the transformations or changes that were made to migrate the data to GSA
All of the code to carry and tests developed to support this data migration are retained as part of the FAC.
The data
- 2016: invalidauditrecord, inspectionrecord
- 2017: invalidauditrecord, inspectionrecord
- 2018: invalidauditrecord, inspectionrecord
- 2019: invalidauditrecord, inspectionrecord
- 2020: invalidauditrecord, inspectionrecord
- 2021: invalidauditrecord, inspectionrecord
- 2022: invalidauditrecord, inspectionrecord
How to work with this data
The invalidauditrecord
is most useful in terms of identifying the handful of audits that contained irreconcilable migration errors. For example, a record may have been missing one or more sections of the SF-SAC. These tables record the audits that were, quite literally, incomplete, and brought into the new system despite this.
The inspectionrecords
are most useful in understanding what changes were enacted in order to modernize historical data. The table contains 15 columns:
- id
- audit_year
- dbkey
- report_id
- run_datetime
- … and one column per audit table (general, federal_awards, findings, etc.)
The first five columns are linking information; they allow for connecting back to the historic data (via audit_year
and dbkey
), or to the migrated records (via report_id
). Within the per-table cells are JSON documents of the following form:
{
"census_data": [
{
"column": "sample_col1",
"value": "sample_data1"
},
{
"column": "sample_col2",
"value": "sample_data2"
}
],
"gsa_fac_data": {
"field": "sample_gsa_fac_field",
"value": "sample_value"
},
"transformation_functions": "function_name"
}
As an example:
[
[
{
"census_data": [
{
"value": "027",
"column": "CFDA_EXT"
}
],
"gsa_fac_data": {
"field": "federal_award_extension",
"value": "027"
},
"transformation_functions": [
"xform_replace_invalid_extension"
]
}
]
]
There is one migration record in this list. It records that the Census data in the CFDA_EXT
column was the value 027
, which is not a valid federal award extension. We then recorded the function that was used to correct this issue: xform_replace_invalid_extension. If you search the FAC codebase for these functions, it is possible to see exactly what transformation was enacted in order to produce data that would pass modern data validations. In this case, we would replace invalid data with GSA_MIGRATION
.