Enterprise security teams responsible for preventing insider threats have mixed feelings about acquiring and analyzing internal data. Sure, that data contains a wealth of knowledge about the potential for risk from trusted employees, contractors, vendors and customers. But it also comes with a mountain of legal and organizational headaches, can be contradictory and often generates more questions than answers. No wonder most security programs prefer to rely on monitoring network logs.

But there’s a more methodical way for organizations to approach data acquisition and analysis: before diving into the arduous task of trying to work with the data theyhave, it’s better to first ask what problems they want to solve — and let the answers guide them down the path of obtaining the data they need.

One effective mechanism for carrying out this sequence is to build a model of the problem domain and then go find relevant data to apply to it. At Haystax, we collaborate with diverse subject-matter experts to build probabilistic models known as Bayesian inference networks, which excel at assessing probabilities in complex problem domains where the data is incomplete or even contradictory.

Our user behavior analytics (UBA) model, for example, was developed to detect individuals who show an inclination to commit or abet a variety of malicious insider acts, including: leaving a firm or agency with stolen files or selling the information illegally; committing fraud; sabotaging an organization’s reputation, IT systems or facilities; and committing acts of workplace violence or self-harm. It also can identify indicators of willful negligence (rule flouting, careless attitudes to security, etc.) and unwitting or accidental behavior (human error, fatigue, substance abuse, etc.) that could jeopardize an organization’s security.

The UBA model starts with a top-level hypothesis that an individual is trustworthy, followed by high-level concepts relating to personal trustworthiness such as reliability, stability, credibility and conscientiousness. It then breaks these concepts down into smaller and smaller sub-concepts until they become causal indicators that are measurable in data. Finally, it captures not only the relationships between each concept, but also the relative strength of each relationship.

Sitting at the core of Haystax’s Constellation Analytics Platform, this UBA model provides the structure our customers need to: 1) pinpoint which of their data sets can be most usefully applied to the model; 2) identify any critical data gaps they may have; and 3) ignore data that’s unlikely to be useful. Most importantly, it enables security teams to assess workplace risk in a holistic and predictive way as the individual’s adverse behaviors are starting to manifest themselves — rather than after a major adverse event has taken place.

Data relevant to insider threat mitigation can be categorized as financialprofessionallegal and personal. Within these categories are two main data types: static and dynamic:

  • Static data is typically used for identifying major life events, and can establish a baseline for what ‘normal’ behavior looks like for that individual. This type of data isn’t updated frequently, so there may be longer periods of time with no new information.
  • Dynamic data is updated on the order of hours or days and is the source of detection for smaller, less obvious life and behavioral changes in an individual. For example, there may be a record of marriage (large life event; static data) and a recent vacation for two (smaller life event; dynamic data) indicating a healthy and stable home life.

Ideally, organizations have some of each data type to establish baselines and then maintain day-to-day situational awareness.

Another important part of the data identification and acquisition process is accessibility. There are three levels of data accessibility to consider: publicorganizational or protected/private:

  • Public data is readily available from open external sources.
  • Organizational data is managed internally by a company or government agency and can be obtained if a compelling case is made.
  • Protected/private data is mostly controlled by individuals or third-party entities and is difficult to access without their consent.

The table below contains a detailed list of data sources broken down by category and accessibility level, and by whether it’s static or dynamic.


UBA industry analysts at Gartner have observed that incorporating unstructured information like performance appraisals, travel records and social media posts “can be extremely useful in helping discover and score risky user behavior,” because it provides far better context than structured data from networks and the like. (And with more and more network data being encrypted, pulling threat signals from network logs is in any case becoming increasingly challenging.)

There are dozens of behavioral indicators for which supporting data is available or obtainable, and which can be readily ingested, augmented, applied and analyzed within the Constellation UBA solution. Take the case of a senior-level insider who intends to steal a large volume of his company’s intellectual property (IP). An early risk indicator is that he comes into the office at an odd time (badge records), accesses a file directory he is normally not privy to (network data) and prints out a large document (printer logs). This activity alone would not trigger an alert in Constellation, as it could be that he was assigned a new project with a tight deadline by a different department.

But then data is obtained which reveals that he is experiencing financial or personal stress (public bankruptcy/divorce records), leading to degraded work performance (poor supervisor reviews) and several tense confrontations with colleagues (staff complaints), all of which will elevate him in Constellation to a moderate risk. Finally, he is caught posting a derisive rant about the company on social media (public data) and either contacting a competitor (email/phone logs) or booking a one-way ticket to another country (travel records). This activity elevates him to high-risk status in Constellation and he is put on a watch list, so that when rumors spread of pending departmental layoffs (HR plans) and the company detects him downloading large files to a thumb drive (DLP alert), the company’s security team is ready to act.

October is National Cybersecurity Awareness Month in the US. In the months leading up to it, technology and security experts have increasingly come to the consensus view that while insider threats constitute one of the fastest-growing risks to the IT and physical security environments, organizations don’t have the analytical tools — or the data —to pinpoint their biggest threats in a timely way.

The reality is that most organizations today still try to detect their insider threats by analyzing log aggregation files, and not much else. Because they invariably end up with an excessive number of false positives and redundant alerts, their analysts often feel overwhelmed trying to triage their cases and waste precious time chasing down contextual information to verify what’s real and what is not.

By contrast, Haystax’s approach with its Constellation UBA solution is to apply a much larger volume and variety of data to a probabilistic behavioral risk model, which then continuously updates its ‘belief’ that each employee is trustworthy (or not). With Constellation — and a broader array of data sources — a security team can perform true cyber-risk management, avoiding alert overload and focusing instead on quickly and proactively identifying those individuals who are poised to do the most harm to the enterprise.

Hannah Hein is Insider Threat Project Manager at Haystax Technology.