1. Guiding Usage Principles

Modern AI and Machine Learning (ML) technology drives the evolution of Fortra's platform-based solutions. It is used for advanced threat detection and provides enhanced protection of customer data, users, and brands.

The relevant AI terms are defined below. We then describe our guiding usage principles for each AI technology.

1.1 Terminology

The AI landscape is continually evolving. As a result, common terms in the field lack definitions that are universal. Fortra uses the following definitions:

Artificial Intelligence (AI)

Refers to computer systems capable of performing complex tasks typically requiring human intelligence, such as reasoning, decision-making, and creative problem-solving.

Machine Learning (ML)

The study of computer algorithms that can improve automatically through experience and by the use of data. Algorithms that extract and recognize patterns, learn implicit rules, and make predictions.

Data Science (DS)

DS at Fortra is regarded as an umbrella term covering all “scientific” work with data, including developing AI and ML solutions and systems.

AI/ML System

A system in this context is a collection of algorithms, models, and engineering components that solve a specific DS problem such as e.g. “email classification” or “detecting sensitive content in files”

Deep Learning

A subfield of ML; covers ML algorithms that use deep (having multiple layers) neural network architectures. The layers are used to extract higher-level features from raw input. By using many layers of transformations, we can do representational learning (automatically find the appropriate way to represent data).

Large Language Models (LLMs)

An LLM is a deep neural network (typically using a Transformer architecture) pre-trained on large textual datasets. It is a model that treats text as tokens and can make a prediction about (generate) the next token. An LLM does not have to contain a generative component. Instead, it can be “fine-tuned” for a specific downstream task such as classification of text. We discuss such (non-generative) models under the “LLM Usage” section.

Generative AI (GenAI)

Broadly speaking, ML systems that are capable of generating text, images, code, or other types of content, often in response to a prompt entered by a user.

In practice, a large (very deep) Transformer model trained on large amounts of data from a specific domain, and given a sequence of tokens (prompt), used to generate new tokens from that domain.

Agentic AI

Autonomous systems capable of initiating tasks without direct human intervention, leveraging automated reasoning to make guided decisions for problem solving. Unlike traditional applications, agentic AI can behave in adaptive and unpredictable ways, introducing novel risks that conventional governance models cannot manage. However, not all agentic systems behave unpredictably. We elaborate on this point under the “Agentic AI Usage” section.

1.2 ML Model Usage

Fortra develops and uses ML models across many products and services. These classical, non-generative, models have the following properties:

They cannot be prompted.
They cannot leak training data.
Their input and output format is consistent and predictable.
With very few exceptions, they are trained internally.
They are hosted internally.
In isolation, they do not pose any risk.
A customer cannot opt-out of the use of these models as they are the backbone of the ML capability of a given product/service and are required to deliver the security outcomes that customers expect from Fortra.

1.3 LLM Usage

Fortra uses (non-generative) Large Language Models (LLMs) across many products and services. There are two critical points that should address many usage concerns:

As defined above, an LLM is just a machine learning model. It cannot be prompted, and it cannot “leak” training data. It expects a particular type of input (e.g. text in sentence or paragraph form) and returns a specified type of output (e.g. a category or a class).
The main purpose for using an LLM is “language understanding”; the LLM converts our input text into an “embedding” or an “encoding” which is effectively a high-dimensional representation of our input. We then use that representation to solve a downstream task.

1.4 Generative AI Usage

There are two broad categories of Generative AI (GenAI) model use cases at Fortra: internal and customer-facing. It is essential to distinguish between these two categories as the requirements, expectations, and concerns are very different.

Internal Use Cases

We define an internal use case as an application that the customer cannot interact with directly. The purpose of the application may vary, and it may have been built for the use of various internal Fortra security team roles e.g. a data scientist, threat researcher, or Security Operations Center (SOC) analyst.

We break down each use case based on the following aspects:

Data Type	The type of input data (public, customer, private/sensitive^*)
Where is the Model Hosted?	Internal, vendor, public
Is the Model Fine-Tuned?	Yes/No + how?
Risk?	What risk does the app create, if any

Here are a few examples illustrating the risk:

If the input data is public (e.g. URLs collected from a feed) there is no risk to the customer even if the model is publicly hosted.
If the input data is non-sensitive customer data, and the model is hosted internally, there is no risk to the customer. Fortra would never use a publicly hosted model for customer data.
If the input data is sensitive/private to customer, and the model is hosted internally, there is still a risk of sensitive content leakage. In such cases, even if the design guarantees low risk, Fortra would provide the customer with an opt out clause for the specific capability.

*By private/sensitive data we mean customer data that Forta employees cannot review due to contractual restrictions.

Customer-Facing Solutions

Fortra is committed to the following development principles for any released or future applications:

The generative model(s) powering the solution would be internally hosted.
The solution would include guardrails against:
1. Malicious input e.g. prompt injection
2. Misleading or harmful output
The solution would include a component to sanitize the output, ensuring:
1. Fidelity of output i.e. accuracy, soundness
2. Model hallucinations are limited or avoided
The solution would be rigorously tested over a specified period prior to general availability (GA).
The system would be closely monitored upon GA:
1. Logging of user interaction (where permissible)
2. Periodic, manual, internal review of interactions
If the generative model needs to be fine-tuned on sensitive customer data, we would ensure:
1. Isolated tenant for each customer (applies to training, not just inference)
2. Specific testing and additional guardrails against potential data leakage

1.5 Agentic AI Usage

There is no consensus on the difference between a GenAI-powered system and an Agentic AI system. The most common distinction is that an Agentic AI system has access to additional components such as tools and apps that the GenAI model at the core of the system uses, whereas a GenAI system is a standalone model that receives a prompt and generates a response token by token.

The main concern arising from Agentic AI usage relates to solutions that integrate an Agent into a workflow in a way that allows the Agent to impact outcomes or make decisions. A simple example of this is an agent that books a trip for you based on your preferences. By using this agent, a customer is exposing themselves to concrete risk (e.g., agent books a trip to North Korea). Fortra believes that this is unacceptable risk, and therefore:

Fortra is developing several AI Agents (or Agentic Systems) that make use of tools and apps in a predictable way.
Fortra does not have plans to develop AI Agents that would act “autonomously” within a system or make decisions (consequential or not) on behalf of users.
Fortra may develop AI Agents that would act semi-autonomously subject to human review and ongoing monitoring and governance. Such agents do not take any actions without a human-in-the-loop.

1.6 Explainability

We design AI systems to provide meaningful explanations of outputs when appropriate, while recognizing that certain advanced models may not support full transparency into internal decision logic. We maintain documentation describing the intended purpose, data sources, and known constraints of AI systems within our services.

1.7 Opt Out

We recognize that customers may have varying preferences regarding the use of AI technologies. Where feasible, we provide configuration options that allow customers to limit or disable certain AI-assisted features.

However, some AI systems are embedded within the service to meet security, reliability, and compliance objectives. These systems are required to deliver the contracted service and are not subject to opt-out. Processing customer data to generate outputs during service operation does not constitute AI training.

Customer specific data is not used to train models in a way that exposes or transfers proprietary information to the model. Any data used for training is subject to safeguards such as customer deidentification, aggregation, and minimization where appropriate.

Customers may contact us to understand which AI capabilities are optional and the operational impact of disabling them.

2. Customer Data

Fortra is committed to following industry-best practices and adheres to responsible and secure use of customer data.

We strictly adhere to all data terms and conditions associated with the relevant product. If the customer chooses not to share some aspects of their data, we do not access or use it in any form.

Data Usage Principles

The following principles apply to any Fortra ML system that does not contain a GenAI component:

As a general principle, we remove all references to a specific customer (e.g. organization name) from the data we feed into the models.* We make an exception to this principle if we are building a customer-specific model.
All data (including customer data) is sanitized, processed, and analyzed prior to being used. We use a variety of techniques to compile the most effective training dataset from the millions of examples we observe per day.
Each model undergoes an extensive evaluation process in which we monitor performance over a substantial period, manually review and label model output, and ensure that the False Positive (FP) and False Negative (FN) rates are satisfactory. In other words, human review is very much a critical component of our process and prevents much unintended harm that arises from using vast customer data.
We incorporate customer feedback back into the model as data points that are then used to train subsequent model versions. In some cases, feedback is used to adjust (tune) parameters manually.

*This is NOT equivalent to true anonymization. Since aspects of the customer environment are very relevant for model training, we simply remove the link between the data point and identifying customer information.

Data Usage – GenAI Systems

The following principles apply to any Fortra ML system that contains a GenAI component:

Fortra would never use a publicly hosted GenAI model for customer data.
If the input customer data to a component is sensitive / private, Fortra would provide the customer with an opt out clause for the specific capability.
If the generative model needs to be fine-tuned on sensitive customer data, we would ensure:
- Isolated tenant for each customer (applies to training, not just inference)
- Specific testing and additional guardrails against potential data leakage

3. Secure System Development Life Cycle (SSDLC)

Fortra is committed to robust, thorough, and secure ML system development and deployment processes. Under the development life cycle, we cover model training, testing, evaluation, and monitoring, plus deployment standards and performance testing.

Development Principles

We adhere to the following general principles:

We do not compromise on the quality of the models and systems we release. Each model/system is reviewed by multiple data scientists, and must complete a thorough evaluation process
We seek to apply the appropriate ML solution to each use case
We aim to develop models and frameworks that are reusable and extensible.
Model development is strongly boosted by the open-source ecosystem
We give preference to industry-vetted tools over experimental projects
Custom-built components are thoroughly tested internally

Deployment Principles

Fortra has defined standardized CloudOps processes and tooling for CI/CD, associated deployment architecture, and underlying infrastructure. The deployment of any solution, including our ML systems, aims to comply with these standards:

Mature CI/CD strategy
Common monitoring for all services
Database backups and security
Secret management
Encryption strategy and techniques (e.g., data at rest encryption)

Our ML systems are rigorously tested to ensure that the system performs as expected in production:

Can handle the throughput, including traffic peaks
Does not time out
Output is validated

Model Training

The model training process is dependent on the type of model and data used. However, we ensure that all models are robustly trained by following these general principles:

For each model, we create training, validation, and test datasets which are as closely representative of the real (production) data as possible.
To select the best model, we do extensive hyper-parameter tuning and selection.
We use appropriate metrics for the particular distribution of the data.
The training process and resulting model are reproducible and carefully documented.

Model Testing and Evaluation

Each model we release is rigorously tested and evaluated. The model testing process is iterative in nature:

We review model performance on the test dataset we created during training.
If unsatisfactory, we improve the model and/or training dataset and repeat review.

The evaluation process includes:

Monitoring performance without impacting system output over a significant period, typically weeks.
Manual (human) review of model output by analysts, threat researchers and/or data scientists. We always review multiple samples.
We ensure that the False Positive (FP) and False Negative (FN) rates (if applicable) are satisfactory.

System Monitoring

Monitoring an ML system can be separated into three monitoring tasks:

Monitoring the infrastructure and each component thereof.
Monitoring the system, i.e., treating it as a single model, taking all the constituent model dependencies into account.
Monitoring each individual model included in the system.

Our system (2) and individual model (3) monitoring tasks include:

We track the overall performance of each individual model (e.g. total number of detections a model makes per period of time).
If a particular model shows a clear deviation from its expected behavior, we review samples of the data manually (expert review).
If a model is consistently “misbehaving”, we mark the model for re-development/re-training, and in the short-term introduce patches to correct the behavior.
Where appropriate, we generate periodic reports highlighting the detections each model made.
When a customer reports a False Positive (wrong detection) or a False negative (missed detection) to our attention, we make an effort to identify the cause of the mistake to the extent possible. Upon request, we provide a detailed explanation which is shared with the customer.

4. FAQ

Where can I find product-specific information?

Please reach out to your account manager or [email protected] for more detailed information.

What do you mean by private/sensitive customer data?

By private/sensitive customer data we mean data that Forta employees cannot review due to contractual restrictions. For example, email body content.

Can you provide some examples of classical ML models?

Typical examples of classical ML models are Random Forests, XGBoost, and Support Vector Machines (SVMs). They are commonly used for tasks such as classification, regression, and anomaly detection.

What is the difference between a vendor and a publicly hosted model?

From a security standpoint, the core difference is where the data goes and who controls the infrastructure and guardrails.

Publicly hosted models (e.g., DeepSeek public service) expose the user to significant risk:

Your prompts are sent to an environment you do not control, often shared with other tenants.
The provider controls storage and can transmit your data across borders.
The provider may retain logs, reuse data for training, and operate under foreign jurisdiction and legal obligations.

Vendor-hosted models (e.g. AWS Bedrock) reduce much of the risk. The main concerns include misconfiguration, workflow integration, and failure to ensure tenant isolation.

Fortra AI Use