Last updated: April 16, 2026
Boldon James Classifier by Fortra is an enterprise discovery and classification engine that scans, labels and synchronizes existing data across file servers, SharePoint, OneDrive, Exchange and endpoints, with native connectors to Microsoft Purview and Digital Guardian. Primary audience: CISOs and information security advisers at Dutch organizations with 1,000 to 10,000 FTE under NIS2, ISO 27001, BIO or DORA who need to clear a historical backlog of unlabeled data. Discovery-driven, not creator-driven.
What: a discovery engine that scans unlabeled files on file servers, SharePoint, OneDrive and Exchange, classifies them against 300+ data types, and labels them automatically or suggestively through a rule engine combining regex, lexicon, proximity and ML.
Who: organizations with a historical data backlog under NIS2 article 21, ISO 27001 Annex A 8.2, BIO classification, or DORA article 6.
Where: on file servers, SharePoint Online and on-premises, OneDrive, Exchange mailboxes and endpoints. Labels synchronize with Microsoft Purview, Digital Guardian and Clearswift through OOXML metadata and SMTP X-headers.
When: after a data breach, in the run-up to a NIS2 or ISO audit, during migration to Microsoft 365, or when a Titus rollout does not cover the backlog of unlabeled legacy data.
Cost indication: per user per year, plus capacity modules for SharePoint, Exchange and endpoints. Concrete figures on request after a sizing call.
Timeline: 30-day POC on a scoped dataset, then 4 to 6 months of full rollout including delta scanning and Purview synchronization.
Boldon James Classifier is a server and endpoint product that traverses existing data, inspects content, and applies a classification label to every file that matches a rule. The category is discovery-driven or data-centric classification. A central scan engine crawls file servers, SharePoint sites, OneDrive accounts and Exchange mailboxes. Per file a rule set runs that combines: regular expressions for structured patterns such as IBAN or BSN, lexicons for keyword lists, proximity conditions that weigh the relation between terms, and an optional ML model that recognizes document categories.
The product line was originally developed by Boldon James in Farnham, United Kingdom, acquired by HelpSystems in 2020, and today forms part of Fortra. The full portfolio is called Boldon James Classifier and includes File Classifier, SharePoint Classifier, Exchange Classifier, OneDrive Classifier and the Classifier Administration Server. In the Netherlands the product is delivered and supported as part of the Fortra portfolio; implementation is handled by Neo Security.
Boldon James delivers three things that a standalone script scan or a default Purview discovery run does not. First, a library of more than 300 predefined data types, including Dutch patterns such as BSN, BIC, IBAN, vehicle registrations, parliamentary document numbers, healthcare codes and AGB codes, and international equivalents such as PESEL for Polish personal data and NINO for British employee data. Second, a rule engine that combines regex, lexicon, proximity and ML in composite conditions, which keeps false positives manageable. Third, an audit log of every scan, every match and every label action, readable by Splunk, Microsoft Sentinel and any other SIEM.
Boldon James is not a DLP and not a rights management tool. It produces the label that your DLP, CASB and MFT act on further down the chain. That makes it the complementary piece to Titus on the creation side: Titus covers what is born today, Boldon James clears what was already there. The two share the same schema and the same OOXML metadata fields, so administrators maintain one taxonomy instead of two parallel worlds.
Primary audience: CISOs, information security advisers and enterprise architects at Dutch organizations with 1,000 to 10,000 FTE that have accumulated unlabeled documents on file shares, SharePoint and in personal mailboxes for years. Those organizations no longer know their own data in detail. They know personal data, contract data, financial records and sensitive project files are in there, but when the Autoriteit Persoonsgegevens asks about a specific processing activity, they cannot run a targeted query. They formally comply with GDPR article 30 (record of processing activities), but only after weeks of forensic search per incident.
Secondary audience: data protection officers, compliance officers and audit leads at semi-public institutions (healthcare, municipalities, independent administrative bodies, benefit agencies) that have a BIO classification obligation on every document they produce or receive. For them Boldon James is the technical layer under a policy rule that has stayed on paper. Departementaal vertrouwelijk, staatsgeheim and huishoudelijk are no longer adjectives on a memo; they become machine-readable values on every file.
Tertiary audience: financial entities under DORA article 6, which has required explicit classification of ICT assets and data since 17 January 2025. That obligation applies to all historical data, not only to files created from a given date onward. Boldon James is the only scalable way to catch up on that obligation across a ten or hundred terabyte archive without spinning up a project plan per folder structure.
Boldon James fits less well for three types of organization. For small SMB environments under 100 FTE, the operational overhead of a central scan infrastructure outweighs the benefit. For organizations whose entire dataset lives in a structured database rather than in files, a DAM or DAG tool with native database inspection is more effective. For organizations without a schema and without a classification policy first: a scanner with no labels to hand out only produces a list of matches. The order is schema first, discovery second.
Sectors where Boldon James is most common in the Netherlands: central government and executive agencies under the Baseline Informatiebeveiliging Overheid, financial services under DORA, hospitals and health insurers with NEN 7510 and GDPR records, industrial organizations with intellectual property in decades of engineering archives, and law firms with notarial and legal case files.
Boldon James runs on three topological layers: a central administration server, one or more scan nodes, and optional endpoint agents. The administration server is Windows Server with SQL Server as backend. It hosts the classification schema, the rule engine, the audit console and the connector configurations. Scan nodes are separate Windows Server machines that perform the actual crawling and inspection. An enterprise setup typically runs two to four scan nodes behind a coordinator that distributes work packages.
File Classifier scans classic file servers over SMB. The scanner runs under a service account with read rights on the target directories and respects ACLs without changing them. SharePoint Classifier works through the SharePoint API for both on-premises farms and SharePoint Online. The connector uses Application Permissions or an Azure AD app registration with Sites.Read.All for full tenant coverage. OneDrive scanning uses the same Graph API with delegated permissions and an app registration. Exchange Classifier scans mailboxes through EWS or the Microsoft Graph mail endpoint, including attachments, calendar items and contacts.
Integration points we most often configure in Dutch environments: Microsoft Purview for bidirectional label synchronization, so that labels Boldon James hands out appear as Purview sensitivity labels and labels set manually in Purview are respected by Boldon James. Digital Guardian policy integration, where Boldon James labels act as policy triggers for endpoint DLP rules. Clearswift MIMEsweeper on the email gateway for outbound inspection based on labels and content. Splunk or Microsoft Sentinel for audit log ingest. Active Directory for identity context and group membership inside the rule engine. GoAnywhere MFT for label-driven routing of external file exchange.
The platform reads and writes standard metadata fields: custom properties in OOXML for Word, Excel and PowerPoint, XMP metadata for PDF, X-Classification headers in SMTP, and a dedicated attribute store for file types without a native metadata slot. Visual markings (watermark, header, footer) are optional yet often required under ISO 27001 Annex A 8.3, which explicitly prescribes readable classification for the recipient. For archive formats (ZIP, 7z, TAR) the scanner opens the container, inspects every entry and writes a container metadata file back without altering the original content.
Concrete trigger events. After a data breach where the Autoriteit Persoonsgegevens asks for a full overview of which personal data are processed in which systems: without a labeled archive that question cannot be answered within a reasonable timeframe. In the run-up to a NIS2 audit where article 21 of directive 2022/2555 requires you to demonstrate risk management at data level, including classification of historical data.
During an ISO 27001 certification or recertification audit, where Annex A 8.2 (information classification) and A 8.3 (labeling of information) apply to the entire data estate, not only to new documents. During migration to Microsoft 365 where legacy file shares move to SharePoint Online and you do not want to migrate what you do not know. During a merger or acquisition where you must integrate the data estate of a new entity and need certainty that sensitive data remains properly protected. During a BIO assessment through ENSIA for decentralized government bodies, where an inventory of classified documents is part of the accountability report.
Organic triggers outside direct regulation: your Titus rollout handles new production but the old file server stays dark matter. Legal asks which contracts contain confidentiality clauses and you can only answer through a manual grep. Your DLP triggers on patterns but not on labels, so you cannot differentiate between an internal test document with a test BSN and a real HR record. You find that your GDPR article 30 record does not match where the data actually resides.
Guidance from national and European authorities reinforces the urgency. The Nationaal Cyber Security Centrum calls out inventory of data assets as baseline hygiene in its NIS2 guidance. ENISA describes discovery and classification of existing data as part of the cybersecurity framework underlying NIS2. Both advisory sources put the ball in the controller court: you cannot protect what you do not know.
Honest opener: both alternatives overlap partly with what Boldon James does, and in some environments a combination or even an alternative is the right call. We lay out the differences without steering toward a preferred outcome.
Boldon James versus Titus. The two products answer different questions. Titus labels at creation: the user picks a sensitivity label at the moment an email is sent or a document is saved. Boldon James labels after the fact: a scanner traverses the existing data estate and applies a label based on rules, without user interaction. The two are not mutually exclusive; they complement each other. If you roll out Titus alone, the historical backlog stays unlabeled and your DLP is blind to what was already there. If you roll out Boldon James alone, new production depends on how well the scanner inspects content, which is less accurate than a user who knows what they are writing. Not guesswork on metadata, but a combined approach in which Titus captures new data and Boldon James catches up on the backlog. Many Dutch customers run both in parallel on the same schema.
Boldon James versus Microsoft Purview Discovery. Microsoft Purview offers a native discovery function through content explorer and auto-labeling policies in the Purview compliance portal. For organizations that run fully on E5 licensing and within the Microsoft perimeter, it is a realistic option. Three differences matter in Dutch enterprise environments.
First, coverage outside Microsoft 365. Purview Discovery scans SharePoint Online, OneDrive, Exchange Online and Teams well, but legacy file servers, on-premises SharePoint farms, Exchange on-prem and endpoints only through additional licenses and connectors. Boldon James covers all five through native connectors and a single administration interface.
Second, rule complexity. Purview offers trainable classifiers and sensitive info types, but complex composite rules with proximity conditions and lexicon weighting require workarounds through KQL queries or custom regex sets per label. Boldon James has a graphical rule editor that treats composite conditions as a first-class concept, which keeps maintenance on a schema with dozens of categories tractable.
Third, data sovereignty. Purview classifies in the Microsoft cloud; content passes the Purview backend for inspection. For central government, healthcare and some financial institutions that is a contractual or legal obstacle. Boldon James runs fully on-premises or in a tenant the customer controls, without documents being inspected outside the organization perimeter.
Hybrid is more often the outcome than either-or. Many customers we work with use Boldon James for legacy file shares, on-premises SharePoint and Exchange, and let Purview Discovery handle the pure Microsoft 365 estate, with the synchronization connector as the bridge between both. That takes the strong points of both systems and avoids a choice that will come back later.
Alternatives outside these two main options: Varonis DatAdvantage for permission-analysis-driven discovery, Spirion Sensitive Data Manager for pure pattern matching on personal data, and BigID for data intelligence in regulated industries. Each has its own center of gravity. In the Netherlands Boldon James has the broadest combination of native Dutch patterns (BSN, healthcare codes, AGB, parliamentary document numbers), BIO mapping, and a serviceable supplier structure with Dutch-speaking engineers.
The architecture in prose. A Dutch customer in healthcare with 4,500 FTE and 80 TB of unstructured data runs Boldon James on an administration server Windows Server 2022 with SQL Server 2019, mirrored to a second data center as a warm standby. Three scan nodes run in parallel: one for the file servers (around 60 TB), one for SharePoint Online and OneDrive via Graph API, and one for Exchange Online via EWS. A fourth endpoint agent is rolled out through Microsoft Endpoint Manager across 4,500 workstations in waves of 300.
Agent deployment on file servers uses a service account with read rights on the UNC paths to scan and write rights on a dedicated metadata staging share. The scanner does not touch the content of the files. Only the classification metadata is written to an alternate data stream or a custom property, depending on the file format. For write-unfriendly formats (legacy text files without a metadata slot) the administration server keeps an external index, keyed by a hash of the file.
The SharePoint and OneDrive connector uses an Azure AD app registration with Sites.Read.All, Files.Read.All and User.Read.All. The connector works delta-wise through the Graph Delta API: after the first full sweep, only new or changed items are inspected, typically tens to hundreds of items per minute at an organization of this size. The Exchange mailbox scanner runs a nightly delta on active mailboxes and a full weekend sweep on delegated shared mailboxes and archive mailboxes.
The data type library we activate for this customer includes BSN (with the 11-check), IBAN (with mod-97 validation), BIC, PESEL for Polish employee data, healthcare codes at AGB and UZI level, parliamentary document numbers for correspondence with the Ministry of Health, and vehicle registrations. On top of the 300+ predefined data types, the compliance administrator writes a dozen custom patterns for organization-specific identifiers such as client numbers and case references.
The rule engine combines four building blocks. Regex recognizes structured patterns with a fixed shape. Lexicons hold keyword lists such as medication names, diagnostic codes or legal terms. Proximity rules require two terms to appear within a set character distance, for example a name combined with a medical attribute inside 200 characters. The optional ML model classifies document types (contract, report, note, email) based on structural features and word frequencies. A composite rule combines those building blocks with logical operators, so a match only validates when several signals coincide.
Delta scanning is the critical mode for a production environment. The first full scan of 80 TB at this customer takes roughly five to eight days across three parallel scan nodes, depending on the split between small and large files. After that the system runs in delta mode: only files with a changed modification timestamp or a changed hash come back around, typically 50 to 500 GB per day, processed within a nightly window. The delta calendar lives on the administration server; recovery after downtime resumes at the last confirmed checkpoint without loss.
Purview label synchronization runs through a connector that picks up label changes on both sides every five minutes and propagates them. The Digital Guardian policy integration exports labels through an XML feed to the DG Management Console, where endpoint DLP rules react to changed labels within a quarter of an hour. Clearswift MIMEsweeper on the email gateway reads the same labels through X-Classification headers on outbound mail.
Failure modes we see in the field. ACL drift: a scan running under a service account with stale group memberships misses a subset of directories, so the scan is incomplete without that being visible. Remedy: monthly reconciliation between the service account and the AD group model, with an audit rule that reports inaccessible directories. Scan-storm IO impact: a parallel scan on four nodes against a saturated SAN triggers production incidents, especially for healthcare and financial workloads. Remedy: scan throttling at IOPS level, windowed scanning outside business hours, and an exclusion list for latency-sensitive shares.
False positives in legacy formats: old Word 97 documents, scanned PDFs without OCR and proprietary CAD formats produce text fragments that trigger regex rules on coincidental patterns. Remedy: an OCR preprocessor for PDF, format-specific exclusions for CAD, and a review loop where a compliance administrator confirms samples per pattern. Unicode and encoding in archives: a ZIP entry with ISO-8859-15 encoded file names breaks a scanner expecting UTF-8, so the archive is marked unscannable without anyone noticing. Remedy: a fallback encoding detector and a recurring report on unscannable items that lands in the compliance console.
The timeline of a typical rollout. Week 1 to 4: schema definition, POC on a scoped dataset (typically 2 to 5 TB, one directory structure, one mailbox group), first rule set validation. Month 2 to 3: rollout of the scan infrastructure, first full sweep, Purview connector configuration, audit log connector into the SIEM. Month 4 to 6: delta mode in production, Digital Guardian policy integration, endpoint agent rollout in waves, false-positive tuning on real production data. Next in the chain after Boldon James is almost always Clearswift, because every labeled file has an outbound channel where deep content inspection on the label must run inbound and outbound.
For the underlying question of why classification matters at all, see the full regulatory deep dive. For a broader portfolio view across Titus, Clearswift and Vera, see the solutions page. For a technical intake or POC request, the contact page is the starting point.
Titus is creator-driven: the user labels content at creation time in Outlook, Office and SAP GUI. Boldon James is discovery-driven: a scanner crawls existing file servers, SharePoint, OneDrive and Exchange and labels files automatically based on 300+ data types. Many Dutch organizations deploy both. Titus covers new production. Boldon James clears the backlog and labels data the organization will never touch by hand.
Only indirectly. The product reads metadata and naming of encrypted containers, but cannot inspect content without a decryption key. In practice we configure an integration with the customer key management system or Microsoft Purview rights management, so Boldon James can temporarily open the content under a service account with read rights. For BitLocker volumes on endpoints the endpoint agent scans after decryption during user context.
Indicatively 36 to 72 hours for 10 TB of mixed file-server data on a healthy storage backend with a Gigabit network link. Drivers: file-size distribution, number of small files, number of rule matches per document and CPU load on the scan appliance. After the first full scan, delta scanning runs: only changed and new files, typically minutes per day. Schedule the initial scan on weekends or off-hours to avoid IO contention with production.
Every rule match is logged with context and a source snippet. A compliance administrator reviews samples in the built-in audit console and flags false positives per pattern. The system retunes the proximity and lexicon weighting on that feedback. For stubborn cases you write an exception rule based on file path, owner or application metadata. Enforcement only escalates once the false-positive ratio falls below a threshold agreed with the organisation. The exact threshold depends on the data types and the risk appetite of your security governance.
Yes. Boldon James Classifier has a native connector that synchronizes labels with Microsoft Purview sensitivity labels. Both systems share the same metadata fields in OOXML and the same label taxonomy, provided you mirror the schema on both sides. Many customers use Boldon James for discovery and initial labeling and Purview for rights management and encryption. A POC validates that the synchronization runs without duplicate labels in your tenant.
Partially. The rule engine recognizes patterns such as BSN, healthcare codes, vehicle registrations and parliamentary document numbers, and maps them to BIO classifications up to departementaal vertrouwelijk. Boldon James automatically labels data that unambiguously falls under a rule. For edge cases and staatsgeheim levels a human review step remains mandatory, because the BIO requires an explicit agreed classification and does not allow an automated assumption.
Yes. The Classifier endpoint agent runs on Windows 10 and 11 and scans local disks, USB volumes and network shares the user mounts. The agent works offline and synchronizes findings as soon as the workstation is online again. A more limited scanner is available on macOS. For Linux servers a command-line agent covers regulated workloads. Endpoint scanning runs under user context and respects the user ACLs to protect privacy boundaries.
Per-user subscription per year for the core Classifier suite, plus capacity modules for SharePoint, Exchange and endpoint agents scaled against user counts. Support tier and scan volume drive the final price. Concrete figures come from your implementation partner after a sizing call. For central government and healthcare a multi-year agreement is common, with a seat-count ladder and a fixed support annex for BIO and NEN 7510 audits.
Regulatory sources: GDPR 2016/679, NIS2 2022/2555, ISO/IEC 27001:2022.