Home Data Reliability Validations Intake validations

SF-SAC intake validations

Intake validations are run as an audit is submitted piece-by-piece. In order to validate SF-SAC form sections, we do three things:

Convert the spreadsheet into JSON
Validate the data in this intermediate JSON form
Convert the data to a meaningful semantic representation

Each of these steps helps ensure a better submission to the FAC.

Reporting errors

Intake validations are critical for several reasons.

They are user-centered.
When a validation finds a mistake in the data, we try and produce error messages that are human-friendly. They are what help auditors and grantees fix errors they may have made. Our messages specify errors down to the specific cell, and are refined for usability based on user feedback.
They are self-contained.
Each validation does only one thing. This helps the GSA team with maintaining the validations.
They are comprehensive.
The intake validations are intended to cover all possible error conditions we might encounter. In the event that an auditor or grantee finds a new, surprising way to enter data into our forms... we add a new intake validation.

Ultimately, our goal is for the intake validations to provide a good user experience when part of the SF-SAC is completed incorrectly while simultaneously ensuring that we colelct quality data via the form.

Step 1: Convert the spreadsheet into JSON

Spreadsheets are first converted into JSON, or Javascript Object Notation. Internally, this is referred to as the intermediate representation. This intermediate representation is a small language that uses named ranges to define regions of data. We describe this representation more fully as part of our intakelib documentation.

This conversion is a validation of its own kind; if we cannot convert the input to our intermediate representation, it is obviously not an SF-SAC upload! The SF-SAC forms must be saved in an Excel 2007 XML-based format (or xlsx). We chose this because many current spreadsheet manipulation tools (e.g. LibreOffice) are capable of saving data in this format. This means that it is possible to prepare an SF-SAC submission using free and open source software.

Step 2: Validate the data

The intermediate form makes the data validation easier. Every section of the SF-SAC has a common set of validations applied. For example, we make sure every form section has a UEI defined. Then, we apply section-specific validations.

Validations applied to the SF-SAC at time of intake.
Form section	Validation	Description
ALL
	uei_exists	Checks that the UEI is present
	look_for_empty_rows	Alerts to empty rows in the middle of a form
	start_and_end_rows_of_all_columns_are_same	Makes sure all columns are the same length
	is_right_workbook	Checks that the workbook matches the section being submitted to
	has_all_the_named_ranges	Asserts all input ranges for a section are present
	has_all_required_fields	Checks that all required fields are present in a given workbook
	check_for_gsa_migration_keyword	Makes sure GSA_MIGRATION only appears in the context of the data migration
Federal Awards
	y_or_n__fields	Checks all Y/N fields are correct
	federal_award_amount_passed_through_optional	Checks there is a subrecipient amount present if "Y" selected for "Passed Through"
	check_cluster_names	Checks the correctness of cluster names
	state_cluster_names	If STATE_CLUSTER is indicated, asserts they provided a state cluster name
	other_cluster_names	If OTHER_CLUSTER is indicated, asserts they provided some other cluster name
	passthrough_name_when_no_direct	Asserts correct behavior of passthrough name/ID when funding is direct/nidirect
	loan_balance_present	If a loan is guaranteed, checks that a balance is provided
	loan_balance_entry_is_valid	Checks that a loan balance is a number, empty string, or N/A
	no_major_program_no_type	Checks presence of report type depending on major/minor declaration
	all_unique_award_numbers	Checks that no award numbers are duplicated in a submission
	sequential_award_numbers	Asserts AWARD-#### numbers are present and sequential
	aln_agency_prefix	Checks for valid ALN prefixes
	aln_three_digit_extension	Checks for valid ALN extensions
	additional_award_identification	If RD or U is indicated in the ALN, enforces additional award information
	federal_program_total_is_correct	Checks sums of federal programs against values submitted
	cluster_total_is_correct	Checks sums of cluster totals
	total_amount_expended_is_correct	Asserts that the total amount expended on the coversheet matches the sum of the values reported
	cardinality_of_passthrough_names_and_ids	Checks that passthrough names and IDs are of the same cardinality
Notes to SEFA
	y_or_n__fields	Checks all Y/N fields are correct
	award_references_pattern	Asserts award reference numbers are correctly formatted
	finding_prior_references_pattern	Asserts prior award numbers are correctly formatted
	finding_reference_pattern	Asserts the finding reference numbers are correctly formatted
	no_repeat_findings	Checks for prior references if they are indicated
	findings_grid_validation	Enforces correct possible values for findings Y/N grid
Findings Text
	y_or_n__fields	Checks all Y/N fields are correct
Corrective Action Plan Text
	y_or_n__fields	Checks all Y/N fields are correct

Step 3: Convert the data to a meaningful form

After intake validations are applied, we then transform the data into its storage form. This is another JSON representation, but it is more meaningful. Instead of representing the spreadsheet, it represents the meaningful connections between portions of the data. This form is then further validated by our schema validations.