Welcome to Tiresias’s documentation!

Tiresias provides a pytorch implementation of Tiresias: Predicting Security Events Through Deep Learning. This code was implemented as part of the IEEE S&P 2022 DeepCASE: Semi-Supervised Contextual Analysis of Security Events paper. We ask people to cite both works when using the software for academic research papers, see Citing for more information.

Installation

The most straigtforward way of installing Tiresias is via pip

pip install tiresias

From source

If you wish to stay up to date with the latest development version, you can instead download the source code. In this case, make sure that you have all the required dependencies installed.

Once the dependencies have been installed, run:

pip install -e <path/to/directory/containing/tiresias/setup.py>

Dependencies

Tiresias requires the following python packages to be installed:

All dependencies should be automatically downloaded if you install Tiresias via pip. However, should you want to install these libraries manually, you can install the dependencies using the requirements.txt file

pip install -r requirements.txt

Or you can install these libraries yourself

pip install -U array-lstm numpy scikit-learn torch

Usage

This section gives a high-level overview of the modules implemented by Tiresias. Furthermore it provides insights into the use of the command line tool. We also include several working examples to guide users through the code. For detailed documentation of individual methods, we refer to the Reference guide.

Overview

This section explains the design of Tiresias on a high level. Tiresias is a network that is implemented as a torch-train Module, which is an extension of torch.nn.Module including automatic methods to fit() and predict() data. This means it can be trained and used as any neural network module in the pytorch library.

In addition, we provide automatic methods to train and predict events given previous event sequences using the torch-train library. This follows a scikit-learn approach with fit(), predict() and fit_predict() methods. We refer to its documentation for a detailed description.

Command line tool

When Tiresias is installed, it can be used from the command line. The __main__.py file in the tiresias module implements this command line tool. The command line tool provides a quick and easy interface to predict sequences from .csv files. The full command line usage is given in its help page:

usage: tiresias.py [-h] [--csv CSV] [--txt TXT] [--length LENGTH] [--timeout TIMEOUT] [--hidden HIDDEN] [-i INPUT] [-k K] [-o] [-t TOP] [--save SAVE] [--load LOAD] [-b BATCH_SIZE] [-d DEVICE] [-e EPOCHS]
                 {train,predict}

Tiresias: Predicting Security Events Through Deep Learning

positional arguments:
{train,predict}              mode in which to run Tiresias

optional arguments:
-h, --help                   show this help message and exit

Input parameters:
--csv       CSV              CSV events file to process
--txt       TXT              TXT events file to process
--length    LENGTH           sequence LENGTH                          (default =   20)
--timeout   TIMEOUT          sequence TIMEOUT (seconds)               (default =  inf)

Tiresias parameters:
--hidden    HIDDEN           hidden dimension                         (default =  128)
-i, --input INPUT            input  dimension                         (default =  300)
-k, --k     K                number of concurrent memory cells        (default =    4)
-o, --online                 use online training while predicting
-t, --top   TOP              accept any of the TOP predictions        (default =    1)
--save      SAVE             save Tiresias to   specified file
--load      LOAD             load Tiresias from specified file

Training parameters:
-b, --batch-size BATCH_SIZE  batch size                               (default =  128)
-d, --device DEVICE          train using given device (cpu|cuda|auto) (default = auto)
-e, --epochs EPOCHS          number of epochs to train with           (default =   10)

Examples

Use first half of <data.csv> to train Tiresias and use second half of <data.csv> to predict and test the prediction.

python3 -m tiresias train   --csv <data_train.csv> --save tiresias.save # Training
python3 -m tiresias predict --csv <data_test.csv>  --load tiresias.save # Predicting

Code

To use Tiresias into your own project, you can use it as a standalone module. Here we show some simple examples on how to use the Tiresias package in your own python code. For a complete documentation we refer to the Reference guide.

Import

To import components from Tiresias simply use the following format

from tiresias          import <Object>
from tiresias.<module> import <Object>

For example, the following code imports the Tiresias neural network as found in the Reference.

# Imports
from tiresias import Tiresias

Working example

In this example, we load data from either a .csv or .txt file and use that data to train and predict with Tiresias.

# import Tiresias and Preprocessor
from tiresias              import Tiresias
from tiresias.preprocessor import Preprocessor

##############################################################################
#                                 Load data                                  #
##############################################################################

# Create preprocessor for loading data
preprocessor = Preprocessor(
    length  = 20,           # Extract sequences of 20 items
    timeout = float('inf'), # Do not include a maximum allowed time between events
)

# Load data from csv file
X, y, label, mapping = preprocessor.csv("<path/to/file.csv>")
# Load data from txt file
X, y, label, mapping = preprocessor.txt("<path/to/file.txt>")

##############################################################################
#                                  Tiresias                                  #
##############################################################################

# Create Tiresias object
tiresias = Tiresias(
    input_size  = 300, # Number of different events to expect
    hidden_size = 128, # Hidden dimension, we suggest 128
    output_size = 300, # Number of different events to expect
    k           = 4,   # Number of parallel LSTMs for ArrayLSTM
)

# Optionally cast data and Tiresias to cuda, if available
tiresias = tiresias.to("cuda")
X        = X       .to("cuda")
y        = y       .to("cuda")

# Train tiresias
tiresias.fit(
    X          = X,
    y          = y,
    epochs     = 10,
    batch_size = 128,
)

# Predict using tiresias
y_pred, confidence = tiresias.predict_online(
    X = X,
    y = y,
    k = 3,
)

Modifying Tiresias

Tiresias itself works with an LSTM as implemented by ArrayLSTM from the array-lstm package. Suppose that we want to use a regular LSTM instead, we can simply create a new class that extends Tiresias and overwrite the __init__ method to replace the ArrayLSTM with a regular LSTM.

# Imports
import torch.nn as nn
from tiresias import Tiresias

# Create a new class of Tiresias to overwrite the original
class TiresiasLSTM(Tiresias):

  # We overwrite the __init__ method
  def __init__(self, input_size, hidden_size, output_size, k):
        # Initialise super
        super().__init__(input_size, hidden_size, output_size, k)

        # Replace the lstm layer with a regular LSTM
        self.lstm = nn.LSTM(input_size, hidden_size, batch_first=True)

Reference

This is the reference documentation for the classes and methods objects provided by the Tiresias module.

Preprocessor

The Preprocessor class provides methods to automatically extract event sequences from various common data formats. To start sequencing, first create the Preprocessor object.

class preprocessor.Preprocessor(length, timeout, NO_EVENT=- 1337)[source]

Preprocessor for loading data from standard data formats.

Preprocessor.__init__(length, timeout, NO_EVENT=- 1337)[source]

Preprocessor for loading data from standard data formats.

Parameters
  • length (int) – Number of events in context.

  • timeout (float) – Maximum time between context event and the actual event in seconds.

  • NO_EVENT (int, default=-1337) – ID of NO_EVENT event, i.e., event returned for context when no event was present. This happens in case of timeout or if an event simply does not have enough preceding context events.

Formats

We currently support the following formats:
  • .csv files containing a header row that specifies the columns ‘timestamp’, ‘event’ and ‘machine’.

  • .txt files containing a line for each machine and a sequence of events (integers) separated by spaces.

Transforming .csv files into sequences is the quickest method and is done by the following method call:

Preprocessor.csv(path, nrows=None, labels=None, verbose=False)[source]

Preprocess data from csv file.

Note

Format: The assumed format of a .csv file is that the first line of the file contains the headers, which should include timestamp, machine, event (and optionally label). The remaining lines of the .csv file will be interpreted as data.

Parameters
  • path (string) – Path to input file from which to read data.

  • nrows (int, default=None) – If given, limit the number of rows to read to nrows.

  • labels (int or array-like of shape=(n_samples,), optional) – If a int is given, label all sequences with given int. If an array-like is given, use the given labels for the data in file. Note: will overwrite any ‘label’ data in input file.

  • verbose (boolean, default=False) – If True, prints progress in transforming input to sequences.

Returns

  • events (torch.Tensor of shape=(n_samples,)) – Events in data.

  • context (torch.Tensor of shape=(n_samples, context_length)) – Context events for each event in events.

  • labels (torch.Tensor of shape=(n_samples,)) – Labels will be None if no labels parameter is given, and if data does not contain any ‘labels’ column.

Transforming .txt files into sequences is slower, but still possible using the following method call:

Preprocessor.text(path, nrows=None, labels=None, verbose=False)[source]

Preprocess data from text file.

Note

Format: The assumed format of a text file is that each line in the text file contains a space-separated sequence of event IDs for a machine. I.e. for n machines, there will be n lines in the file.

Parameters
  • path (string) – Path to input file from which to read data.

  • nrows (int, default=None) – If given, limit the number of rows to read to nrows.

  • labels (int or array-like of shape=(n_samples,), optional) – If a int is given, label all sequences with given int. If an array-like is given, use the given labels for the data in file. Note: will overwrite any ‘label’ data in input file.

  • verbose (boolean, default=False) – If True, prints progress in transforming input to sequences.

Returns

  • events (torch.Tensor of shape=(n_samples,)) – Events in data.

  • context (torch.Tensor of shape=(n_samples, context_length)) – Context events for each event in events.

  • labels (torch.Tensor of shape=(n_samples,)) – Labels will be None if no labels parameter is given, and if data does not contain any ‘labels’ column.

Future supported formats

Note

These formats already have an API entrance, but are currently NOT supported.

  • .json files containing values for ‘timestamp’, ‘event’ and ‘machine’.

  • .ndjson where each line contains a json file with keys ‘timestamp’, ‘event’ and ‘machine’.

Preprocessor.json(path, labels=None, verbose=False)[source]

Preprocess data from json file.

Note

json preprocessing will become available in a future version.

Parameters
  • path (string) – Path to input file from which to read data.

  • labels (int or array-like of shape=(n_samples,), optional) – If a int is given, label all sequences with given int. If an array-like is given, use the given labels for the data in file. Note: will overwrite any ‘label’ data in input file.

  • verbose (boolean, default=False) – If True, prints progress in transforming input to sequences.

Returns

  • events (torch.Tensor of shape=(n_samples,)) – Events in data.

  • context (torch.Tensor of shape=(n_samples, context_length)) – Context events for each event in events.

  • labels (torch.Tensor of shape=(n_samples,)) – Labels will be None if no labels parameter is given, and if data does not contain any ‘labels’ column.

Preprocessor.ndjson(path, labels=None, verbose=False)[source]

Preprocess data from ndjson file.

Note

ndjson preprocessing will become available in a future version.

Parameters
  • path (string) – Path to input file from which to read data.

  • labels (int or array-like of shape=(n_samples,), optional) – If a int is given, label all sequences with given int. If an array-like is given, use the given labels for the data in file. Note: will overwrite any ‘label’ data in input file.

  • verbose (boolean, default=False) – If True, prints progress in transforming input to sequences.

Returns

  • events (torch.Tensor of shape=(n_samples,)) – Events in data.

  • context (torch.Tensor of shape=(n_samples, context_length)) – Context events for each event in events.

  • labels (torch.Tensor of shape=(n_samples,)) – Labels will be None if no labels parameter is given, and if data does not contain any ‘labels’ column.

Tiresias

The Tiresias class uses the torch-train library for training and prediction. This class implements the neural network as described in the paper Tiresias: Predicting Security Events Through Deep Learning.

class tiresias.Tiresias(*args: Any, **kwargs: Any)[source]

Implementation of Tiresias

From Tiresias: Predicting security events through deep learning by Shen et al.

Note

This is a batch_first=True implementation, hence the forward() method expect inputs of shape=(batch, seq_len, input_size).

input_size

Size of input dimension

Type

int

hidden_size

Size of hidden dimension

Type

int

output_size

Size of output dimension

Type

int

k

Number of parallel memory structures, i.e. cell states to use

Type

int

Initialization

Tiresias.__init__(input_size, hidden_size, output_size, k)[source]

Implementation of Tiresias

Parameters
  • input_size (int) – Size of input dimension

  • hidden_size (int) – Size of hidden dimension

  • output_size (int) – Size of output dimension

  • k (int) – Number of parallel memory structures, i.e. cell states to use

Forward

As Tiresias is a Neural Network, it implements the forward() method which passes input through the entire network.

Tiresias.forward(X)[source]

Forward data through the network

Parameters

X (torch.Tensor of shape=(n_samples, seq_len)) – Input of sequences, these will be one-hot encoded to an array of shape=(n_samples, seq_len, input_size)

Returns

result – Returns a probability distribution of the possible outputs

Return type

torch.Tensor of shape=(n_samples, size_out)

Fit

Tiresias inherits its fit method from the torch-train module. See the documentation for a complete reference.

Tiresias.fit(X, y, epochs=10, batch_size=32, learning_rate=0.01, criterion=torch.nn.NLLLoss, optimizer=torch.optim.SGD, variable=False, verbose=True, **kwargs)

Train the module with given parameters

Parameters
  • X (torch.Tensor) – Tensor to train with

  • y (torch.Tensor) – Target tensor

  • epochs (int, default=10) – Number of epochs to train with

  • batch_size (int, default=32) – Default batch size to use for training

  • learning_rate (float, default=0.01) – Learning rate to use for optimizer

  • criterion (nn.Loss, default=nn.NLLLoss) – Loss function to use

  • optimizer (optim.Optimizer, default=optim.SGD) – Optimizer to use for training

  • variable (boolean, default=False) – If True, accept inputs of variable length

  • verbose (boolean, default=True) – If True, prints training progress

Returns

result – Returns self

Return type

self

Predict

The regular network gives a probability distribution over all possible output values. However, Tiresias outputs the k most likely outputs, therefore it overwrites the predict() method of the Module class from torch-train.

Tiresias.predict(X, k=1, variable=False, verbose=True)[source]

Predict the k most likely output values

Parameters
  • X (torch.Tensor of shape=(n_samples, seq_len)) – Input of sequences, these will be one-hot encoded to an array of shape=(n_samples, seq_len, input_size)

  • k (int, default=1) – Number of output items to generate

  • variable (boolean, default=False) – If True, predict inputs of different sequence lengths

  • verbose (boolean, default=True) – If True, print output

Returns

  • result (torch.Tensor of shape=(n_samples, k)) – k most likely outputs

  • confidence (torch.Tensor of shape=(n_samples, k)) – Confidence levels for each output

In addition to regular prediction, Tiresias introduces online prediction. In this implementation, the network predicts outputs for given inputs and compares them to what actually occurred. If the prediction does not match the actual output event, we update the neural network before predicting the next events. This is done using the method predict_online().

Tiresias.predict_online(X, y, k=1, epochs=10, batch_size=32, learning_rate=0.0001, criterion=torch.nn.NLLLoss, optimizer=torch.optim.SGD, variable=False, verbose=True, **kwargs)[source]

Predict samples in X and update the network only if the prediction does not match y

Parameters
  • X (torch.Tensor) – Tensor to predict/train with

  • y (torch.Tensor) – Target tensor

  • k (int, default=1) – Number of output items to generate

  • epochs (int, default=10) – Number of epochs to train with

  • batch_size (int, default=32) – Default batch size to use for training

  • learning_rate (float, default=0.01) – Learning rate to use for optimizer

  • criterion (nn.Loss, default=nn.NLLLoss) – Loss function to use

  • optimizer (optim.Optimizer, default=optim.SGD) – Optimizer to use for training

  • variable (boolean, default=False) – If True, accept inputs of variable length

  • verbose (boolean, default=True) – If True, prints training progress

Returns

  • result (torch.Tensor of shape=(n_samples, k)) – k most likely outputs

  • confidence (torch.Tensor of shape=(n_samples, k)) – Confidence levels for each output

Contributors

This page lists all the contributors to this project. If you want to be involved in maintaining code or adding new features, please email t(dot)s(dot)vanede(at)utwente(dot)nl.

Code

  • Thijs van Ede

Academic Contributors

  • Thijs van Ede

  • Hojjat Aghakhani

  • Noah Spahn

  • Riccardo Bortolameotti

  • Marco Cova

  • Andrea Continella

  • Maarten van Steen

  • Andreas Peter

  • Christopher Kruegel

  • Giovanni Vigna

License

MIT License

Copyright (c) 2021 Thijs van Ede

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Citing

To cite Tiresias please use the following publications:

van Ede, T., Aghakhani, H., Spahn, N., Bortolameotti, R., Cova, M., Continella, A., van Steen, M., Peter, A., Kruegel, C. & Vigna, G. (2022, May). DeepCASE: Semi-Supervised Contextual Analysis of Security Events. In 2022 Proceedings of the IEEE Symposium on Security and Privacy (S&P). IEEE. [PDF DeepCASE]

Shen, Y., Mariconti, E., Vervier, P. A., & Stringhini, G. (2018). Tiresias: Predicting security events through deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security (CCS) (pp. 592-605). [PDF Tiresias]

Bibtex

DeepCASE

@inproceedings{vanede2020deepcase,
  title={{DeepCASE: Semi-Supervised Contextual Analysis of Security Events}},
  author={van Ede, Thijs and Aghakhani, Hojjat and Spahn, Noah and Bortolameotti, Riccardo and Cova, Marco and Continella, Andrea and van Steen, Maarten and Peter, Andreas and Kruegel, Christopher and Vigna, Giovanni},
  booktitle={Proceedings of the IEEE Symposium on Security and Privacy (S&P)},
  year={2022},
  organization={IEEE}
}

Tiresias

@inproceedings{shen2018tiresias,
  title={Tiresias: Predicting security events through deep learning},
  author={Shen, Yun and Mariconti, Enrico and Vervier, Pierre Antoine and Stringhini, Gianluca},
  booktitle={Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security},
  pages={592--605},
  year={2018}
}