Background & Overview
HDF Schema
The full generalized HDF schema is as follows:
{
platform: { //required field
name //required field
release //required field
target_id
}
version //required field
statistics: { //required field
duration
}
profiles: [ //required field
0: {
name //required field
version
sha256 //required field
title
maintainer
summary
license
copyright
copyright_email
supports //required field
attributes //required field
groups //required field
controls: [ //required field
0: {
id //required field
title
desc
descriptions
impact //required field
refs //required field
tags //required field
code
source_location //required field
results: [ //required field
0: {
status
code_desc //required field
message
run_time
start_time //required field
}
]
}
]
status
}
]
passthrough: {
auxiliary_data: [
0: {
name
data
}
]
raw
}
}
(Note: The documented schema is subject to change and not all required fields need to be populated; for the full schema and more information on the fields, refer to saf.mitre.org/#/normalize)
HDF Schema Breakdown
The HDF schema can be grouped into 3 sets of structures, with each structure being a subset of the previous structure. These groupings are: profiles, controls, and results.
The profiles structure contains metadata on the scan target of the original security service export and on the run performed by the security tool. This provides a high-level overview of the scan service run and target which are both digestible and easily accessible to the user. A generalized format is as follows:
profiles: [
0: {
name //Name of profile, usually the original security service tool; should be unique
version //Version of security service tool
sha256 //Hash of HDF file; NOTE: AUTOMATICALLY GENERATED BY HDF CONVERTERS, DO NOT POPULATE
title //Title of security service scan; should be human readable
maintainer //Maintainer
summary //Summary of security service export
license //Copyright license
copyright //Copyright holder
copyright_email //Copyright holder's email
supports //Supported platform targets
attributes //Inputs/attributes used in scan
groups //Set of descriptions for the control groups
controls //Controls substructure (see below)
status //Status of profile (typically 'loaded')
}
... //More items may exist if the security service produces multiple scan targets per export
]
Controls are security parameters used to prevent unauthorized access to sensitive information or infrastructure. In the case of HDF Converters, the controls structure is a collection of such controls tested for retroactively by an external security service to ensure that the target complies with vulnerability and weakness prevention standards. The controls structure is a subset of the profiles structure. A generalized format is as follows:
controls: [
0: {
id //ID of control; used for sorting, should be unique for each unique control
title //Title of control
desc //Description of the control
descriptions //Additional descriptions; usually 'check' and 'fix' text for control
impact //Security severity of control
refs //References to external control documentation
tags //Control tags; typically correlate to existing vulnerability/weakness database (e.g., NIST, CVE, CWE)
code //Control source code for code preservation
source_location //Location of control within source code
results //Results substructure (see below)
}
... //More items may exist if there are multiple controls reported per profile
]
The results structure contains information on the results of specific tests ran by the security service on the scan target against a set of security controls. These results will always correlate to a certain control and will either report 'passed' or 'failed' to indicate the test status (other statuses exist but are rare), which cumulatively affect the compliance level of the scan target with the indicated control set. The results structure is a subset of the controls structure. A generalized structure is as follows:
results: [
0: {
status //Pass/fail status of test (other statuses exist but are rare)
code_desc //Test expectations as defined by control
message //Demonstration of expected and actual result of test to justify test status
run_time //Overall runtime of test
start_time //Starting time of test
}
... //More items may exist if there are multiple results reported per control
]
These aforementioned structures cumulatively result in the following generalized structure which primarily defines the HDF:
//Data fields have been removed for the sake of demonstration
profiles: [
0: {
controls: [
0: {
results: [
0: {
},
...
]
},
...
]
},
...
]
There are additional structures in the HDF schema which are used for metadata/extraneous information storage. These exist alongside the profiles structure on the top level of the HDF schema. The general structure for the top level of the HDF schema is as follows:
{
platform: { //Information on the platform handling the HDF file; usually 'Heimdall Tools'
name //Platform name
release //Platform version
target_id //Platform target ID
}
version //Platform version
statistics: { //Statistics relating to target scan run
duration //Duration of run
}
profiles //Profiles structure
passthrough: { //Extraneous information storage
auxiliary_data: [ //Storage for unused data from the sample file
0: {
name //Name of auxiliary data source
data //Auxiliary data
}
... //More items may exist if there are multiple auxiliary data sources available
]
raw //Raw data dump of input security service export
}
}
HDF Schema Mapping Example Walkthrough
The following is an example of a high-level mapping from the Twistlock file format to the HDF. The purpose of this demonstration is to give an easy, non-technical approach to generating a prototype for *-to-HDF mappers that can be used as a guideline for the development of actual technical mappers for the HDF Converter. This process is generally recommended as the first step for the development of any mapper for the HDF Converter.
(NOTE: The format used by your export may not match the one being used in this demonstration. The mappings used in this example are for demonstration purposes and should not be taken as a definitive resource; creative interpretation is necessary for the most accurate mapping according to the specifics of your security service export.)
Given a sample Twistlock scan export (as seen below), our goal is to roughly identify and group data fields according to our 3 primary structures in HDF (profiles, controls, and results) and the non-applicable structure (passthrough). For profiles, we want to find metadata; for controls, we want to find general security control information; for results, we want to find specific security control testing information; and we can place everything else into passthrough.
//Sample Twistlock scan export
{
"results": [
{
"id": "sha256:111",
"name": "registry.io/test",
"distro": "Red Hat Enterprise Linux release 8.6 (Ootpa)",
"distroRelease": "RHEL8",
"digest": "sha256:222",
"collections": [
"All",
"TEST-COLLECTION"
],
"packages": [
{
"type": "os",
"name": "nss-util",
"version": "3.67.0-7.el8_5",
"licenses": [
"MPLv2.0"
]
}
],
"vulnerabilities": [
{
"id": "CVE-2021-43529",
"status": "affected",
"cvss": 9.8,
"description": "DOCUMENTATION: A remote code execution flaw was found in the way NSS verifies certificates. This flaw allows an attacker posing as an SSL/TLS server to trigger this issue in a client application compiled with NSS when it tries to initiate an SSL/TLS connection. Similarly, a server application compiled with NSS, which processes client certificates, can receive a malicious certificate via a client, triggering the flaw. The highest threat to this vulnerability is confidentiality, integrity, as well as system availability. STATEMENT: The issue is not limited to TLS. Any applications that use NSS certificate verification are vulnerable; S/MIME is impacted as well. Similarly, a server application compiled with NSS, which processes client certificates, can receive a malicious certificate via a client. Firefox is not vulnerable to this flaw as it uses the mozilla::pkix for certificate verification. Thunderbird is affected when parsing email with the S/MIME signature. Thunderbird on Red Hat Enterprise Linux 8.4 and later does not need to be updated since it uses the system NSS library, but earlier Red Hat Enterprise Linux 8 extended life streams will need to update Thunderbird as well as NSS. MITIGATION: Red Hat has investigated whether a possible mitigation exists for this issue, and has not been able to identify a practical example. Please update the affec",
"severity": "critical",
"packageName": "nss-util",
"packageVersion": "3.67.0-7.el8_5",
"link": "https://access.redhat.com/security/cve/CVE-2021-43529",
"riskFactors": [
"Remote execution",
"Attack complexity: low",
"Attack vector: network",
"Critical severity",
"Recent vulnerability"
],
"impactedVersions": [
"*"
],
"publishedDate": "2021-12-01T00:00:00Z",
"discoveredDate": "2022-05-18T12:24:22Z",
"layerTime": "2022-05-16T23:12:25Z"
}
],
"vulnerabilityDistribution": {
"critical": 1,
"high": 0,
"medium": 0,
"low": 0,
"total": 1
},
"vulnerabilityScanPassed": true,
"history": [
{
"created": "2022-05-03T08:38:31Z"
},
{
"created": "2022-05-03T08:39:27Z"
}
],
"scanTime": "2022-05-18T12:24:32.855444532Z",
"scanID": "asdfghjkl"
}
],
"consoleURL": "https://twistlock.test.net/#!/monitor/vulnerabilities/images/ci?search=sha256%333"
}
Thus, upon successive passes we can roughly outline what we expect each data field in the Twistlock scan export to correlate to in the HDF. We first want to identify metadata which will most likely belong in the profiles structure. Such data fields will primarily be related to the general security scan itself or be related to the target system that is being scanned, as seen below:
//Data values are removed for visual clarity
{
"results": [
{
"id", //Scan target metadata -> profiles
"name", //
"distro", //
"distroRelease", //
"digest", //
"collections", //
"packages": [], //
"vulnerabilities": [
{
"id",
"status",
"cvss",
"description",
"severity",
"packageName",
"packageVersion",
"link",
"riskFactors": [],
"impactedVersions": [],
"publishedDate",
"discoveredDate",
"layerTime"
}
],
"vulnerabilityDistribution": {}, //Twistlock scan metadata -> profiles
"vulnerabilityScanPassed", //
"history": [], //Scan target package install history -> profiles
"scanTime", //Twistlock scan metadata -> profiles
"scanID" //
}
],
"consoleURL" //Twistlock scan metadata -> profiles
}
Next, we want to roughly outline general security control information that correlates to our controls structure. For this, we want to look for information that provides a background for the tests performed by the security service. Usually, this strongly correlates to information that gives us a why, what, and how for the tests that are performed, as seen with the fields that are highlighted below:
//Data values are removed for visual clarity
{
"results": [
{
"id", //Scan target metadata -> profiles
"name", //
"distro", //
"distroRelease", //
"digest", //
"collections", //
"packages": [], //
"vulnerabilities": [
{
"id", //ID of control tested against -> controls
"status",
"cvss", //CVSS severity score of control -> controls
"description", //Description of control -> controls
"severity", //Severity of control failure -> controls
"packageName",
"packageVersion",
"link", //Link to control documentation -> controls
"riskFactors": [],
"impactedVersions": [],
"publishedDate", //Control discovery date -> controls
"discoveredDate",
"layerTime"
}
],
"vulnerabilityDistribution": {}, //Twistlock scan metadata -> profiles
"vulnerabilityScanPassed", //
"history": [], //Scan target package install history -> profiles
"scanTime", //Twistlock scan metadata -> profiles
"scanID" //
}
],
"consoleURL" //Twistlock scan metadata -> profiles
}
After that, we want to outline items that relate to specific instances of control tests ran against the scan target as part of the results structure. Usually, this strongly correlates to information that gives us a who, what, and when for the specific tests that are performed, as seen with the fields that are highlighted below:
//Data values are removed for visual clarity
{
"results": [
{
"id", //Scan target metadata -> profiles
"name", //
"distro", //
"distroRelease", //
"digest", //
"collections", //
"packages": [], //
"vulnerabilities": [
{
"id", //ID of control tested against -> controls
"status", //Pass/fail result of the control test -> results
"cvss", //CVSS severity score of control -> controls
"description", //Description of control -> controls
"severity", //Severity of control failure -> controls
"packageName", //Package ran against control test -> results
"packageVersion", //Version of package ran against control test -> results
"link", //Link to control documentation -> controls
"riskFactors": [], //Risk factors associated with failing this specific control test -> results
"impactedVersions": [], //Vulnerable versions of package ran against control test -> results
"publishedDate", //Control discovery date -> controls
"discoveredDate", //Date this control result was discovered -> results
"layerTime"
}
],
"vulnerabilityDistribution": {}, //Twistlock scan metadata -> profiles
"vulnerabilityScanPassed", //
"history": [], //Scan target package install history -> profiles
"scanTime", //Twistlock scan metadata -> profiles
"scanID" //
}
],
"consoleURL" //Twistlock scan metadata -> profiles
}
For fields that we cannot reasonably categorize or have no information about, we can instead just place them into the passthrough structure, as seen below:
//Data values are removed for visual clarity
{
"results": [
{
"id", //Scan target metadata -> profiles
"name", //
"distro", //
"distroRelease", //
"digest", //
"collections", //
"packages": [], //
"vulnerabilities": [
{
"id", //ID of control tested against -> controls
"status", //Pass/fail result of the control test -> results
"cvss", //CVSS severity score of control -> controls
"description", //Description of control -> controls
"severity", //Severity of control failure -> controls
"packageName", //Package ran against control test -> results
"packageVersion", //Version of package ran against control test -> results
"link", //Link to control documentation -> controls
"riskFactors": [], //Risk factors associated with failing this specific control test -> results
"impactedVersions": [], //Vulnerable versions of package ran against control test -> results
"publishedDate", //Control discovery date -> controls
"discoveredDate", //Date this control result was discovered -> results
"layerTime" //Information on package install time; extraneous -> passthrough
}
],
"vulnerabilityDistribution": {}, //Twistlock scan metadata -> profiles
"vulnerabilityScanPassed", //
"history": [], //Scan target package install history -> profiles
"scanTime", //Twistlock scan metadata -> profiles
"scanID" //
}
],
"consoleURL" //Twistlock scan metadata -> profiles
}
With this, we now have a general outline which roughly connects each data field in the Twistlock sample export to one of our structures in the HDF. In order to improve the accuracy of this mapping, we can now begin connecting specific fields in the HDF schema with the data fields in the sample export using our rough draft as a guide.
If we cannot find a field in the HDF schema that fits with a certain field in the sample export per our original groupings, we can instead look to the other structures to see if they have applicable fields or place the field into the passthrough structure as a last resort.
//Data values are removed for visual clarity
{
"results": [
{
"id", //profiles -> passthrough.auxiliary_data.data
"name", //profiles -> profiles.name
"distro", //profiles -> passthrough.auxiliary_data.data
"distroRelease", //profiles -> passthrough.auxiliary_data.data
"digest", //profiles -> passthrough.auxiliary_data.data
"collections", //profiles -> profiles.title
"packages": [], //profiles -> passthrough.auxiliary_data.data
"vulnerabilities": [
{
"id", //controls -> profiles.controls.id
"status", //results -> profiles.controls.results.status
"cvss", //controls -> profiles.controls.code
"description", //controls -> profiles.controls.desc
"severity", //controls -> profiles.controls.impact
"packageName", //results -> profiles.controls.results.code_desc
"packageVersion", //results -> profiles.controls.results.code_desc
"link", //controls -> profiles.controls.code
"riskFactors": [], //results -> profiles.controls.code
"impactedVersions": [], //results -> profiles.controls.results.code_desc
"publishedDate", //controls -> profiles.controls.code
"discoveredDate", //results -> profiles.controls.results.start_time
"layerTime" //passthrough -> profiles.controls.code
}
],
"vulnerabilityDistribution": {}, //profiles -> profiles.summary
"vulnerabilityScanPassed", //profiles -> passthrough.auxiliary_data.data
"history": [], //profiles -> passthrough.auxiliary_data.data
"scanTime", //profiles -> passthrough.auxiliary_data.data
"scanID" //profiles -> passthrough.auxiliary_data.data
}
],
"consoleURL" //profiles -> passthrough.auxiliary_data.data
}
With this, we now have a detailed high-level mapping for the conversion from an external file format to the HDF, which we can use for the technical implementation of a *-to-HDF mapper.
HDF Converters Structure
The following is a simplified depiction of the directory tree for the HDF Converter. Only noteworthy and potentially useful files and directories are included. It is not imperative to memorize the structure, but it is useful to familiarize yourself with it to better understand what exists where within the HDF Converter for future reference.
hdf-converters
+-- data
| +-- converters
| | +-- csv2json.ts
| | +-- xml2json.ts
+-- sample_jsons //Sample exports for mapper testing are located here
+-- src //*-to-HDF mappers are located here
| +-- converters-from-hdf //HDF-to-* mappers are located here
| | +-- reverse-any-base-converter.ts
| | +-- reverse-base-converter.ts
| +-- mappings //Non-HDF mappers are located here (e.g., CVE, CCI, NIST)
| +-- utils
| | +-- fingerprinting.ts
| | +-- global.ts
| +-- base-converter.ts
+-- test //Mapper tests are located here
| +-- mappers
| | +-- forward //*-to-HDF tests
| | +-- reverse //HDF-to-* tests
| | +-- utils.ts
+-- types //Explicit data typing for known export schemas
+-- index.ts
+-- package.json
Base Converter Tools
[//] # WIP
The base-converter class is the underlying foundation which enables *-to-HDF mapping in HDF Converters. It defines *-to-HDF mappers and provides critical tools which allow for the construction of such mappers. All *-to-HDF mappers inherit from this class and therefore have access to the tools that this class provides; it is thus imperative that you utilize these tools to their fullest potential to ease and simplify mapper development. The provided tools are as follows:
path: Denote JSON object path to go to
- Use:
- Example:
transformer: Execute given code sequence; operates similar to an anonymous function
- Use:
- Example:
arrayTransformer: Execute given code sequence on a given array; primarily used when in an attribute that is an array of objects
- Use:
- Example:
pathTransform:
- Use:
- Example:
key: Used by Base Converter to sort the an array of objects by
- Use:
- Example: