artifact.py
Some guidelines to follow while writing a parser for Veritas:
An artifact's module (such as prefetch.py
) must return following lists:
- A list containing formatted hex data,
formattedhexdata
. (Returned by thereadFile()
function) - A list containing formatted ascii data,
formattedasciidata
. (Returned by thereadFile()
function) - A list containing the template data,
templatedata
. - A list containing the marker data,
artifactmarkers
.
The first two lists are called formattedhexdata
and formattedasciidata
which are to be populated using the readFile()
function. It performs some under the hood formatting to make all the data as it should be in a typical hex view.
formattedhexdata
artifact.py
should return it as returned from readFile()
.
formattedasciidata
artifact.py
should return it as returned from readFile()
.
templatedata
This list is generated using toAbsolute()
function from the offsetter.py
module.
It consumes two lists which are to be built by the corresponding module author:
artifacttemplate
:
For example, there are only three sections to template in a particular artifact assuming they are all continuous in the file structure. Then the templatedata would be generated in the following manner:
artifacttemplate = []
sectionone = [[1, 4], [5, 3], [8, 2]] # 3 sub-sections within section one of sizes 4, 3 and 2 bytes respectively.
sectiontwo = [[1, 8], [9, 2]] # 2 sub-sections within section two of sizes 8 and 2 bytes respectively.
sectionthree = [[1, 16], [17, 8], [25, 8]] # 3 sub-sections within section one of sizes 16, 8 and 8 bytes respectively.
# Upto however many sections in a file format specification.
artifacttemplate.append(sectionone) # [[[1, 4], [5, 3], [8, 2]]]
artifacttemplate.append(sectiontwo) # [[[1, 4], [5, 3], [8, 2]], [[1, 8], [9, 2]]]
artifacettemplate.append(sectionthree) # [[[1, 4], [5, 3], [8, 2]], [[1, 8], [9, 2]], [[1, 16], [17, 8], [25, 8]]]
artifactsizes
:artifactsizes = [] sectiononesize = 9 # 4 + 3 + 2 sectiontwosize = 10 # 8 + 2 sectionthreesize = 32 # 16 + 8 + 8 # Upto however many sections in a file format specification. artifactsizes.append(sectiononesize) # [9] artifactsizes.append(sectiontwosize) # [9, 10] artifactsizes.append(sectionthreesize) # [9, 10, 32]
Note
Apply conditional logic for a section's presence/absence anywhere necessary to yield dynamic templates for flexibility.
Refer to prefetch.py for the hashstring
section's conditional existence.
Before finally returning all the lists, both of the above lists are passed to the toAbsolute()
function present in offsetter.py
which returns a list containing the absolute template data.
templatedata = toAbsolute(artifacttemplate, artifactsizes)
Tip
Please handle all possible versions/formats for a particular artifact to avoid bugs and errors. Test and validate your parser on a multivariate dataset of the artifact before creating a PR. This helps save time and resources preventing unnecessary debugging.
artifactmarkers
This list contains all the textual data representing the sections and bytes within, for an artifact's file structure.
So, for above example there would be a need to append three strings corresponding to the same order of template and size data appended.
artifactmarkers = []
artifactmarkers.append("\n+9 This is the first section!\n")
artifactmarkers.append("\n+10 This is the second section!\n")
artifactmarkers.append("\n+32 This is the third section!\n")
Note
Since the sizes were static for this example, the +n
byte values are hardcoded in the strings. Whenever there is a variable length section, artifact.py
must first calculate the dynamic length correctly and then use format strings in python to generate the strings as shown below.
artifactmarkers.append(f"\n+{variablesizecalculatedbyseeklogic} This is a section of variable size!\n")
As for the inputs, an artifact.py
module should take only one parameter file_path
to read the file. This must be done using the readFile()
function present in primer.py
. There is also a readPartialFile()
function which is used to only read necessary bytes in loader.py
to determine the magic numbers to determine the type of artifact loaded. This was done for performance optimization.
All in all, at the very beginning of an artifact.py
module the following code should be a common thing to see:
artifacttemplate = []
artifactsizes = []
artifactmarkers = []
formattedhexdata, formattedasciidata, hexdata = readFile(file_path)
And at the end when returning data, the following code:
templatedata = toAbsolute(artifacttemplate, artifactsizes)
return formattedhexdata, formattedasciidata, templatedata, artifactmarkers