Background & Overview

HDF Schema 

The full generalized HDF schema is as follows:

{
  platform: {                      //required field
    name                          //required field
    release                       //required field
    target_id
   }
  version                          //required field
  statistics: {                    //required field
    duration
   }
  profiles: [                      //required field
    0: {
      name                       //required field
      version
      sha256                     //required field
      title
      maintainer
      summary
      license
      copyright
      copyright_email
      supports                   //required field
      attributes                 //required field
      groups                     //required field
      controls: [                //required field
        0: {
          id                   //required field
          title
          desc
          descriptions
          impact               //required field
          refs                 //required field
          tags                 //required field
          code
          source_location      //required field
          results: [           //required field
            0: {
              status
              code_desc      //required field
              message
              run_time
              start_time     //required field
            }
          ]
        }
      ]
      status
    }
  ]
  passthrough: {
    auxiliary_data: [
      0: {
        name
        data
      }
    ]
    raw
  }
}

(Note: The documented schema is subject to change and not all required fields need to be populated; for the full schema and more information on the fields, refer to saf.mitre.org/#/normalize)

 

HDF Schema Breakdown 

The HDF schema can be grouped into 3 sets of structures, with each structure being a subset of the previous structure. These groupings are: profiles, controls, and results.

The profiles structure contains metadata on the scan target of the original security service export and on the run performed by the security tool. This provides a high-level overview of the scan service run and target which are both digestible and easily accessible to the user. A generalized format is as follows:

profiles: [
  0: {
    name                 //Name of profile, usually the original security service tool; should be unique
    version              //Version of security service tool
    sha256               //Hash of HDF file; NOTE: AUTOMATICALLY GENERATED BY HDF CONVERTERS, DO NOT POPULATE
    title                //Title of security service scan; should be human readable
    maintainer           //Maintainer
    summary              //Summary of security service export
    license              //Copyright license
    copyright            //Copyright holder
    copyright_email      //Copyright holder's email
    supports             //Supported platform targets
    attributes           //Inputs/attributes used in scan
    groups               //Set of descriptions for the control groups
    controls             //Controls substructure (see below)
    status               //Status of profile (typically 'loaded')
  }
  ... //More items may exist if the security service produces multiple scan targets per export
]

Controls are security parameters used to prevent unauthorized access to sensitive information or infrastructure. In the case of HDF Converters, the controls structure is a collection of such controls tested for retroactively by an external security service to ensure that the target complies with vulnerability and weakness prevention standards. The controls structure is a subset of the profiles structure. A generalized format is as follows:

controls: [
  0: {
    id                //ID of control; used for sorting, should be unique for each unique control
    title             //Title of control
    desc              //Description of the control
    descriptions      //Additional descriptions; usually 'check' and 'fix' text for control
    impact            //Security severity of control
    refs              //References to external control documentation
    tags              //Control tags; typically correlate to existing vulnerability/weakness database (e.g., NIST, CVE, CWE)
    code              //Control source code for code preservation
    source_location   //Location of control within source code
    results           //Results substructure (see below)
  }
  ... //More items may exist if there are multiple controls reported per profile
]

The results structure contains information on the results of specific tests ran by the security service on the scan target against a set of security controls. These results will always correlate to a certain control and will either report 'passed' or 'failed' to indicate the test status (other statuses exist but are rare), which cumulatively affect the compliance level of the scan target with the indicated control set. The results structure is a subset of the controls structure. A generalized structure is as follows:

results: [
  0: {
    status         //Pass/fail status of test (other statuses exist but are rare)
    code_desc      //Test expectations as defined by control
    message        //Demonstration of expected and actual result of test to justify test status
    run_time       //Overall runtime of test
    start_time     //Starting time of test
  }
  ... //More items may exist if there are multiple results reported per control
]

These aforementioned structures cumulatively result in the following generalized structure which primarily defines the HDF:

//Data fields have been removed for the sake of demonstration
profiles: [
  0: {
    controls: [
      0: {
        results: [
          0: {
          },
          ...
        ]
      },
      ...
    ]
  },
  ...
]

There are additional structures in the HDF schema which are used for metadata/extraneous information storage. These exist alongside the profiles structure on the top level of the HDF schema. The general structure for the top level of the HDF schema is as follows:

{
  platform: {                //Information on the platform handling the HDF file; usually 'Heimdall Tools'
    name                    //Platform name
    release                 //Platform version
    target_id               //Platform target ID
  }
  version                    //Platform version
  statistics: {              //Statistics relating to target scan run
    duration                //Duration of run
  }
  profiles                   //Profiles structure
  passthrough: {             //Extraneous information storage
    auxiliary_data: [       //Storage for unused data from the sample file
      0: {
        name                //Name of auxiliary data source
        data                //Auxiliary data
      }
      ... //More items may exist if there are multiple auxiliary data sources available
    ]
    raw                     //Raw data dump of input security service export
  }
}

 

HDF Schema Mapping Example Walkthrough 

The following is an example of a high-level mapping from the Twistlock file format to the HDF. The purpose of this demonstration is to give an easy, non-technical approach to generating a prototype for *-to-HDF mappers that can be used as a guideline for the development of actual technical mappers for the HDF Converter. This process is generally recommended as the first step for the development of any mapper for the HDF Converter.

(NOTE: The format used by your export may not match the one being used in this demonstration. The mappings used in this example are for demonstration purposes and should not be taken as a definitive resource; creative interpretation is necessary for the most accurate mapping according to the specifics of your security service export.)

Given a sample Twistlock scan export (as seen below), our goal is to roughly identify and group data fields according to our 3 primary structures in HDF (profiles, controls, and results) and the non-applicable structure (passthrough). For profiles, we want to find metadata; for controls, we want to find general security control information; for results, we want to find specific security control testing information; and we can place everything else into passthrough.

//Sample Twistlock scan export
{
  "results": [
    {
      "id": "sha256:111",
      "name": "registry.io/test",
      "distro": "Red Hat Enterprise Linux release 8.6 (Ootpa)",
      "distroRelease": "RHEL8",
      "digest": "sha256:222",
      "collections": [
        "All",
        "TEST-COLLECTION"
      ],
      "packages": [
        {
          "type": "os",
          "name": "nss-util",
          "version": "3.67.0-7.el8_5",
          "licenses": [
            "MPLv2.0"
          ]
        }
      ],
      "vulnerabilities": [
        {
          "id": "CVE-2021-43529",
          "status": "affected",
          "cvss": 9.8,
          "description": "DOCUMENTATION: A remote code execution flaw was found in the way NSS verifies certificates. This flaw allows an attacker posing as an SSL/TLS server to trigger this issue in a client application compiled with NSS when it tries to initiate an SSL/TLS connection.  Similarly, a server application compiled with NSS, which processes client certificates, can receive a malicious certificate via a client, triggering the flaw. The highest threat to this vulnerability is confidentiality, integrity, as well as system availability.              STATEMENT: The issue is not limited to TLS. Any applications that use NSS certificate verification are vulnerable; S/MIME is impacted as well.  Similarly, a server application compiled with NSS, which processes client certificates, can receive a malicious certificate via a client.  Firefox is not vulnerable to this flaw as it uses the mozilla::pkix for certificate verification. Thunderbird is affected when parsing email with the S/MIME signature.  Thunderbird on Red Hat Enterprise Linux 8.4 and later does not need to be updated since it uses the system NSS library, but earlier Red Hat Enterprise Linux 8 extended life streams will need to update Thunderbird as well as NSS.             MITIGATION: Red Hat has investigated whether a possible mitigation exists for this issue, and has not been able to identify a practical example. Please update the affec",
          "severity": "critical",
          "packageName": "nss-util",
          "packageVersion": "3.67.0-7.el8_5",
          "link": "https://access.redhat.com/security/cve/CVE-2021-43529",
          "riskFactors": [
            "Remote execution",
            "Attack complexity: low",
            "Attack vector: network",
            "Critical severity",
            "Recent vulnerability"
          ],
          "impactedVersions": [
            "*"
          ],
          "publishedDate": "2021-12-01T00:00:00Z",
          "discoveredDate": "2022-05-18T12:24:22Z",
          "layerTime": "2022-05-16T23:12:25Z"
        }
      ],
      "vulnerabilityDistribution": {
        "critical": 1,
        "high": 0,
        "medium": 0,
        "low": 0,
        "total": 1
      },
      "vulnerabilityScanPassed": true,
      "history": [
        {
          "created": "2022-05-03T08:38:31Z"
        },
        {
          "created": "2022-05-03T08:39:27Z"
        }
      ],
      "scanTime": "2022-05-18T12:24:32.855444532Z",
      "scanID": "asdfghjkl"
    }
  ],
  "consoleURL": "https://twistlock.test.net/#!/monitor/vulnerabilities/images/ci?search=sha256%333"
}

Thus, upon successive passes we can roughly outline what we expect each data field in the Twistlock scan export to correlate to in the HDF. We first want to identify metadata which will most likely belong in the profiles structure. Such data fields will primarily be related to the general security scan itself or be related to the target system that is being scanned, as seen below:

//Data values are removed for visual clarity
{
  "results": [
    {
      "id",                               //Scan target metadata -> profiles
      "name",                             //
      "distro",                           //
      "distroRelease",                    //
      "digest",                           //
      "collections",                      //
      "packages": [],                     //
      "vulnerabilities": [
        {
          "id",
          "status",
          "cvss",
          "description",
          "severity",
          "packageName",
          "packageVersion",
          "link",
          "riskFactors": [],
          "impactedVersions": [],
          "publishedDate",
          "discoveredDate",
          "layerTime"
        }
      ],
      "vulnerabilityDistribution": {},      //Twistlock scan metadata -> profiles
      "vulnerabilityScanPassed",            //
      "history": [],                        //Scan target package install history -> profiles
      "scanTime",                           //Twistlock scan metadata -> profiles
      "scanID"                              //
    }
  ],
  "consoleURL"         //Twistlock scan metadata -> profiles
}

Next, we want to roughly outline general security control information that correlates to our controls structure. For this, we want to look for information that provides a background for the tests performed by the security service. Usually, this strongly correlates to information that gives us a why, what, and how for the tests that are performed, as seen with the fields that are highlighted below:

//Data values are removed for visual clarity
{
  "results": [
    {
      "id",                               //Scan target metadata -> profiles
      "name",                             //
      "distro",                           //
      "distroRelease",                    //
      "digest",                           //
      "collections",                      //
      "packages": [],                     //
      "vulnerabilities": [
        {
          "id",                      //ID of control tested against -> controls
          "status",
          "cvss",                    //CVSS severity score of control -> controls
          "description",             //Description of control -> controls
          "severity",                //Severity of control failure -> controls
          "packageName",
          "packageVersion",
          "link",                    //Link to control documentation -> controls
          "riskFactors": [],
          "impactedVersions": [],
          "publishedDate",           //Control discovery date -> controls
          "discoveredDate",
          "layerTime"
        }
      ],
      "vulnerabilityDistribution": {},      //Twistlock scan metadata -> profiles
      "vulnerabilityScanPassed",            //
      "history": [],                        //Scan target package install history -> profiles
      "scanTime",                           //Twistlock scan metadata -> profiles
      "scanID"                              //
    }
  ],
  "consoleURL"         //Twistlock scan metadata -> profiles
}

After that, we want to outline items that relate to specific instances of control tests ran against the scan target as part of the results structure. Usually, this strongly correlates to information that gives us a who, what, and when for the specific tests that are performed, as seen with the fields that are highlighted below:

//Data values are removed for visual clarity
{
  "results": [
    {
      "id",                               //Scan target metadata -> profiles
      "name",                             //
      "distro",                           //
      "distroRelease",                    //
      "digest",                           //
      "collections",                      //
      "packages": [],                     //
      "vulnerabilities": [
        {
          "id",                      //ID of control tested against -> controls
          "status",                  //Pass/fail result of the control test -> results
          "cvss",                    //CVSS severity score of control -> controls
          "description",             //Description of control -> controls
          "severity",                //Severity of control failure -> controls
          "packageName",             //Package ran against control test -> results
          "packageVersion",          //Version of package ran against control test -> results
          "link",                    //Link to control documentation -> controls
          "riskFactors": [],         //Risk factors associated with failing this specific control test -> results
          "impactedVersions": [],    //Vulnerable versions of package ran against control test -> results
          "publishedDate",           //Control discovery date -> controls
          "discoveredDate",          //Date this control result was discovered -> results
          "layerTime"
        }
      ],
      "vulnerabilityDistribution": {},      //Twistlock scan metadata -> profiles
      "vulnerabilityScanPassed",            //
      "history": [],                        //Scan target package install history -> profiles
      "scanTime",                           //Twistlock scan metadata -> profiles
      "scanID"                              //
    }
  ],
  "consoleURL"         //Twistlock scan metadata -> profiles
}

For fields that we cannot reasonably categorize or have no information about, we can instead just place them into the passthrough structure, as seen below:

//Data values are removed for visual clarity
{
  "results": [
    {
      "id",                               //Scan target metadata -> profiles
      "name",                             //
      "distro",                           //
      "distroRelease",                    //
      "digest",                           //
      "collections",                      //
      "packages": [],                     //
      "vulnerabilities": [
        {
          "id",                      //ID of control tested against -> controls
          "status",                  //Pass/fail result of the control test -> results
          "cvss",                    //CVSS severity score of control -> controls
          "description",             //Description of control -> controls
          "severity",                //Severity of control failure -> controls
          "packageName",             //Package ran against control test -> results
          "packageVersion",          //Version of package ran against control test -> results
          "link",                    //Link to control documentation -> controls
          "riskFactors": [],         //Risk factors associated with failing this specific control test -> results
          "impactedVersions": [],    //Vulnerable versions of package ran against control test -> results
          "publishedDate",           //Control discovery date -> controls
          "discoveredDate",          //Date this control result was discovered -> results
          "layerTime"                //Information on package install time; extraneous -> passthrough
        }
      ],
      "vulnerabilityDistribution": {},      //Twistlock scan metadata -> profiles
      "vulnerabilityScanPassed",            //
      "history": [],                        //Scan target package install history -> profiles
      "scanTime",                           //Twistlock scan metadata -> profiles
      "scanID"                              //
    }
  ],
  "consoleURL"         //Twistlock scan metadata -> profiles
}

With this, we now have a general outline which roughly connects each data field in the Twistlock sample export to one of our structures in the HDF. In order to improve the accuracy of this mapping, we can now begin connecting specific fields in the HDF schema with the data fields in the sample export using our rough draft as a guide.

If we cannot find a field in the HDF schema that fits with a certain field in the sample export per our original groupings, we can instead look to the other structures to see if they have applicable fields or place the field into the passthrough structure as a last resort.

//Data values are removed for visual clarity
{
  "results": [
    {
      "id",                               //profiles -> passthrough.auxiliary_data.data
      "name",                             //profiles -> profiles.name
      "distro",                           //profiles -> passthrough.auxiliary_data.data
      "distroRelease",                    //profiles -> passthrough.auxiliary_data.data
      "digest",                           //profiles -> passthrough.auxiliary_data.data
      "collections",                      //profiles -> profiles.title
      "packages": [],                     //profiles -> passthrough.auxiliary_data.data
      "vulnerabilities": [
        {
          "id",                      //controls -> profiles.controls.id
          "status",                  //results -> profiles.controls.results.status
          "cvss",                    //controls -> profiles.controls.code
          "description",             //controls -> profiles.controls.desc
          "severity",                //controls -> profiles.controls.impact
          "packageName",             //results -> profiles.controls.results.code_desc
          "packageVersion",          //results -> profiles.controls.results.code_desc
          "link",                    //controls -> profiles.controls.code
          "riskFactors": [],         //results -> profiles.controls.code
          "impactedVersions": [],    //results -> profiles.controls.results.code_desc
          "publishedDate",           //controls -> profiles.controls.code
          "discoveredDate",          //results -> profiles.controls.results.start_time
          "layerTime"                //passthrough -> profiles.controls.code
        }
      ],
      "vulnerabilityDistribution": {},      //profiles -> profiles.summary
      "vulnerabilityScanPassed",            //profiles -> passthrough.auxiliary_data.data
      "history": [],                        //profiles -> passthrough.auxiliary_data.data
      "scanTime",                           //profiles -> passthrough.auxiliary_data.data
      "scanID"                              //profiles -> passthrough.auxiliary_data.data
    }
  ],
  "consoleURL"         //profiles -> passthrough.auxiliary_data.data
}

With this, we now have a detailed high-level mapping for the conversion from an external file format to the HDF, which we can use for the technical implementation of a *-to-HDF mapper.

 

HDF Converters Structure 

The following is a simplified depiction of the directory tree for the HDF Converter. Only noteworthy and potentially useful files and directories are included. It is not imperative to memorize the structure, but it is useful to familiarize yourself with it to better understand what exists where within the HDF Converter for future reference.

hdf-converters
+-- data
|   +-- converters
|   |   +-- csv2json.ts
|   |   +-- xml2json.ts
+-- sample_jsons                              //Sample exports for mapper testing are located here
+-- src                                       //*-to-HDF mappers are located here
|   +-- converters-from-hdf                   //HDF-to-* mappers are located here
|   |   +-- reverse-any-base-converter.ts
|   |   +-- reverse-base-converter.ts
|   +-- mappings                              //Non-HDF mappers are located here (e.g., CVE, CCI, NIST)
|   +-- utils
|   |   +-- fingerprinting.ts
|   |   +-- global.ts
|   +-- base-converter.ts
+-- test                                      //Mapper tests are located here
|   +-- mappers
|   |   +-- forward                           //*-to-HDF tests
|   |   +-- reverse                           //HDF-to-* tests
|   |   +-- utils.ts
+-- types                                     //Explicit data typing for known export schemas
+-- index.ts
+-- package.json

 

Base Converter Tools 

[//] # WIP

The base-converter class is the underlying foundation which enables *-to-HDF mapping in HDF Converters. It defines *-to-HDF mappers and provides critical tools which allow for the construction of such mappers. All *-to-HDF mappers inherit from this class and therefore have access to the tools that this class provides; it is thus imperative that you utilize these tools to their fullest potential to ease and simplify mapper development. The provided tools are as follows:

path: Denote JSON object path to go to

  • Use:
  • Example:

transformer: Execute given code sequence; operates similar to an anonymous function

  • Use:
  • Example:

arrayTransformer: Execute given code sequence on a given array; primarily used when in an attribute that is an array of objects

  • Use:
  • Example:

pathTransform:

  • Use:
  • Example:

key: Used by Base Converter to sort the an array of objects by

  • Use:
  • Example:
Deploys by Netlify

Copyright © 1997-2026, The MITRE Corporation. All rights reserved.

MITRE is a registered trademark of The MITRE Corporation. Material on this site may be copied and distributed with permission only.