Home Data Reliability Validations Intake validations

SF-SAC intake validations

Intake validations are run as an audit is submitted piece-by-piece. In order to validate SF-SAC form sections, we do three things:

  1. Convert the spreadsheet into JSON
  2. Validate the data in this intermediate JSON form
  3. Convert the data to a meaningful semantic representation

Each of these steps helps ensure a better submission to the FAC.

Reporting errors

Intake validations are critical for several reasons.

  1. They are user-centered.
    When a validation finds a mistake in the data, we try and produce error messages that are human-friendly. They are what help auditors and grantees fix errors they may have made. Our messages specify errors down to the specific cell, and are refined for usability based on user feedback.
  2. They are self-contained.
    Each validation does only one thing. This helps the GSA team with maintaining the validations.
  3. They are comprehensive.
    The intake validations are intended to cover all possible error conditions we might encounter. In the event that an auditor or grantee finds a new, surprising way to enter data into our forms... we add a new intake validation.

Ultimately, our goal is for the intake validations to provide a good user experience when part of the SF-SAC is completed incorrectly while simultaneously ensuring that we colelct quality data via the form.

Step 1: Convert the spreadsheet into JSON

Spreadsheets are first converted into JSON, or Javascript Object Notation. Internally, this is referred to as the intermediate representation. This intermediate representation is a small language that uses named ranges to define regions of data. We describe this representation more fully as part of our intakelib documentation.

This conversion is a validation of its own kind; if we cannot convert the input to our intermediate representation, it is obviously not an SF-SAC upload! The SF-SAC forms must be saved in an Excel 2007 XML-based format (or xlsx). We chose this because many current spreadsheet manipulation tools (e.g. LibreOffice) are capable of saving data in this format. This means that it is possible to prepare an SF-SAC submission using free and open source software.

Step 2: Validate the data

The intermediate form makes the data validation easier. Every section of the SF-SAC has a common set of validations applied. For example, we make sure every form section has a UEI defined. Then, we apply section-specific validations.

Validations applied to the SF-SAC at time of intake.
Form section Validation Description
ALL
  uei_exists Checks that the UEI is present
  look_for_empty_rows Alerts to empty rows in the middle of a form
  start_and_end_rows_of_all_columns_are_same Makes sure all columns are the same length
  is_right_workbook Checks that the workbook matches the section being submitted to
  has_all_the_named_ranges Asserts all input ranges for a section are present
  has_all_required_fields Checks that all required fields are present in a given workbook
  check_for_gsa_migration_keyword Makes sure GSA_MIGRATION only appears in the context of the data migration
Federal Awards
  y_or_n__fields Checks all Y/N fields are correct
  federal_award_amount_passed_through_optional Checks there is a subrecipient amount present if "Y" selected for "Passed Through"
  check_cluster_names Checks the correctness of cluster names
  state_cluster_names If STATE_CLUSTER is indicated, asserts they provided a state cluster name
  other_cluster_names If OTHER_CLUSTER is indicated, asserts they provided some other cluster name
  passthrough_name_when_no_direct Asserts correct behavior of passthrough name/ID when funding is direct/nidirect
  loan_balance_present If a loan is guaranteed, checks that a balance is provided
  loan_balance_entry_is_valid Checks that a loan balance is a number, empty string, or N/A
  no_major_program_no_type Checks presence of report type depending on major/minor declaration
  all_unique_award_numbers Checks that no award numbers are duplicated in a submission
  sequential_award_numbers Asserts AWARD-#### numbers are present and sequential
  aln_agency_prefix Checks for valid ALN prefixes
  aln_three_digit_extension Checks for valid ALN extensions
  additional_award_identification If RD or U is indicated in the ALN, enforces additional award information
  federal_program_total_is_correct Checks sums of federal programs against values submitted
  cluster_total_is_correct Checks sums of cluster totals
  total_amount_expended_is_correct Asserts that the total amount expended on the coversheet matches the sum of the values reported
  cardinality_of_passthrough_names_and_ids Checks that passthrough names and IDs are of the same cardinality
Notes to SEFA
  y_or_n__fields Checks all Y/N fields are correct
  award_references_pattern Asserts award reference numbers are correctly formatted
  finding_prior_references_pattern Asserts prior award numbers are correctly formatted
  finding_reference_pattern Asserts the finding reference numbers are correctly formatted
  no_repeat_findings Checks for prior references if they are indicated
  findings_grid_validation Enforces correct possible values for findings Y/N grid
Findings Text
  y_or_n__fields Checks all Y/N fields are correct
Corrective Action Plan Text
  y_or_n__fields Checks all Y/N fields are correct

Step 3: Convert the data to a meaningful form

After intake validations are applied, we then transform the data into its storage form. This is another JSON representation, but it is more meaningful. Instead of representing the spreadsheet, it represents the meaningful connections between portions of the data. This form is then further validated by our schema validations.