MetaFlows: SC Magazine Innovators Hall of Fame

sc_logo_21413_345884Our friends at SC Magazine have inducted us into the SC Magazine Innovators Hall of Fame. It is nice to be recognized for our innovations. Importantly, this is purely based on their journalistic curiosity; we give them props for performing their reviews based on sound technical knowledge. We refuse to pay money for recognition. You might think we are old-fashioned but this is how we roll at MetaFlows.hall_of_fame_495827

Their article also points out the importance of monitoring beyond the network perimeter using multi-session correlation. If you are not sure what multi-session correlation can do for you, it is best for you to put it to the test. You will be amazed of what you can find out about your network.

Read the article at SC Magazine’s Website

What’s Wrong with Sandboxing?

How Sand-Boxing Works

The latest and hottest trend in cyber-security is sand-boxing. Sand-boxing is virus detection on steroids. Instead of relying on prior knowledge about particular viruses, this technique emulates a user’s workstation with a sandbox and tracks anything that attempts to go out of the box or attempts to infect other machines. The process is straightforward:

  1. Get all potentially infectious content coming into your organization, and
  2. Emulate each piece of content as if it was executing on your hosts.

Limitations of Sand-Boxing

Sand-boxing has low false positive rates, but causes a lot of false negatives. In other words, when it tells you that something is bad, it is almost certainly bad. But it has the potential to miss a lot of bad things.

Architectural Limitations

PerimeterThis limitation has to do with step 1 above (get all dangerous content coming into your organization). Your defense perimeter is dissolving because of new network trends and applications:

  1. Mobile devices continuously come into and go out from your network.
  2. Peer-to-peer protocols (which go right through sand-boxing and firewall appliances) are becoming mainstream (skype, bittorrent, b2b applications).
  3. Services are being pushed to the cloud, out of the grasp of your sandbox.
  4. Virtual machines move around at the speed of light from one host to another.
  5. IPv6 and other emerging trends are facilitating end-to-end encrypted tunneling right through your perimeter.

So, if you do not have a perimeter, how do you know what is coming in? Well, you don’t! That is why sand-boxing (or pure virus detection) is limited in scope and cannot survive the evolution of malware.

Another architectural limitation has to do with cost. If you run a large network, executing and/or opening every piece of content before it is delivered requires a lot of CPU and will slow down your network. Sand-boxing can only scale to a certain size; beyond that it becomes unrealistic and expensive.

Algorithmic Limitations

EvasionThis limitation has to do with step 2 above (emulate each piece of content as if it was executing on your hosts). Evasion is an information security term that refers to the ability of the bad guys to:

  1. Know how you are detecting them and
  2. Add subterfuges to defeat your specific security measures.

A sandbox can be detected. Once malware realizes that it is in a sandbox, the malware will switch to its best behavior so that the sandbox is happy. Only when the malware gets out of the sandbox and on to the the actual target device will it do its damage.

A second algorithmic limitation is that not every system is the same. Sandboxing a particular version of Microsoft (which is what commercial sandbox solutions do) leaves all you other devices (Linux, Apple, Android, etc.) completely open to attack.

How is MetaFlows Better?

MetaFlows is not an antivirus. We detect the attempts to introduce a virus in your network AND/OR detect the presence of a virus. Think of it as a network-level sandbox that not only inspects individual pieces of content, but also keeps track of the behavior of all your devices over time. There is one thing a malicious host cannot evade: being malicious!

If it looks like a duck, swims like a duck, and quacks like a duck… it is a duck.

How does it work?

MetaFlows looks for classes of odd behavior from hosts on your network:

  1. Scanning behavior
  2. Being attacked on vulnerable ports
  3. Downloading dangerous content
  4. Communication with questionable sites or sites that are already known to be bad
  5. Scanning outward or doing a lot of DNS lookups

If we detect behavior from multiple event classes over a time period (ranging from minutes to hours), MetaFlows triggers an alert.

Here is simple example:

  1. External host B performs a brute force attack to guess your password on port 22 on server A .
  2. One hour later there there is a large transfer of data from server B to another server C (on your network).

Bang! That’s a hit for us. But a sandbox has no clue! By itself, a sandbox would not detect this behavior. The malware could “play nice” once it realizes that it is in a sandbox. The sandbox would then allow the malware to leave and get inside your network, where it could do substantial damage. But MetaFlows can keep an eye on software even after it leaves the sandbox.

biohazard-laptopThe main advantage of a network level sand-box is that it does NOT solely rely on inspecting content (like an antivirus) but instead detects malware in the act of being bad. So, if someone walks in through your front gate with an infected laptop, as soon as that laptop misbehaves, it will be flagged down.

 

The best part is that MetaFlows works regardless of what devices are on your network – it solves the algorithmic limitations of sandboxes. Our behavioral event classes do not depend on the type of system: if an internal host is performing outbound scanning, we do not care if it is a Microsoft device or an Apple device. All we need to know is that it has engaged in malicious behavior.

 

networkcableFinally, our approach is much more scalable than a content sandbox. MetaFlows mitigates the architectural limitations of sandboxes by scaling to 10 Gbps links with standard off-the-self quad-CPU systems. The cost and power consumption are orders of magnitude lower.

Predictive Correlation — The Future of Cyber Security?

What is Predictive Correlation?

Research funded by the National Science Foundation has led to the development of a proprietary inter-domain correlation algorithm that is mathematically similar to Google’s Page Rank algorithm. Event scores are autonomously obtained from a global network of honeypot sensors monitored by the MetaFlows Security System (MSS). The honeypots are virtual machines that masquerade as victims. They open up dangerous ports/applications and/or browse dangerous websites. As the honeypots are repeatedly infected, the MSS records both successful and unsuccessful hacker URLs, files, bad ports, and bad services. When a honeypot has a security event that triggers a false positive, the alerts for those events are ranked negatively, thus providing insight into events that should be routinely ignored or turned off. Security events that trigger true positives are ranked positively, thus improving their visibility. This information is then propagated in real time to each of our subscribers’ sensors in the system to augment traditional correlation techniques. This additional inter-domain correlation is important because it adds operational awareness based on real-time intelligence.

How does it work?

As shown in the figure below, honeypots work behind the scenes, continuously mining global relevance data and flow intelligence (IP reputation) for threats that penetrate differing degrees of cyber-defenses on different types of systems. After this step, annotated data from all network sensors (whether the sensors are honeypots or not) are compared and events are correlated with an algorithm similar to Google’s Page Rank algorithm: (X = bs + aW*X) .

Diagram of MetaFlows event correlation system
Figure 1: Predictive Global Correlation

This process is designed to provide subscribers with intelligence data that takes into account the similarities and differences between the sources of the data. For space limitations we cannot explain the math and why it makes sense; however our system builds on the work described in “Highly Predictive Blacklisting” by Jian Zhang, Phillip Porras, Johannes Ullrich, SRI International and the SANS Institute in Usenix Security, August 2008 (we highly recommend that you read this article).

So What?

As a result of the algorithm, once a piece of intelligence reaches our system it is not equally distributed to all customers. Instead, it is mathematically weighted and routed to where it is most relevant, just as the first few web pages of a Google search yield the most relevant information for a particular search.

In addition to real-time intelligence on true positive security events (positive ranking), our system also provides information on security alerts that are irrelevant by demoting them and reducing false positive clutter. In other words, this system can propagate known false positives and known true positives among sensors using a mathematical model that maximizes prediction.

Graph of prediction power for MetaFlows ranking algorithm

The graph above quantifies the prediction power of the ranking algorithm. The experiment was carried out on the Snort event relevance data gathered between February 7th, 2010 and February 22nd, 2010. At the start of each day we performed the ranking operation over the previous day’s Snort event data and compared the predicted ranking values with the actual events gathered during that day from the sensors and honeypots. The simple prediction (blue line) is based on predicting that, for each sensor, the same event ranking is carried over from the previous day without running the algorithm (this is what people normally do today).

The Y axis is the hit ratio. The “hit ratio” is defined as the number of times the prediction matches the outcome in terms of the sign (positive or negative), divided by the number of non-zero rankings predicted.

  • We increment the hit counter if the prediction and the outcome have both positive rankings.
  • We increment the hit counter if the prediction and the outcome have both negative rankings.
  • We decrement the hit counter if the prediction and the outcome have opposite signs.

The figure shows that the ranking prediction (orange line) is strictly superior to simple prediction by 141% to 350% (depending on the day). This might not seem too impressive on the surface but if you dig a little deeper this is what it means:

  • Assuming 5 minutes of human analysis time per incident, a system with no ranking would give you a hit rate that finds 1 actionable item for every 20-30 incident investigations (or 0.4 incidents per analyst hour).
  • A system with predictive ranking would let you find 1 actionable item every 6-7 incidents investigations (2 incidents per analyst hour).

You can do the math in terms of cost savings: it’s huge! Most of the cost of network security systems is not the appliance or the software, but rather wasted analyst time!

You Should Not Just Take our Word for it!

The cyber-security arena is packed with technologies that claim they have the best solutions. That is why we encourage users to take the time to evaluate our predictive correlation and run it side-by-side with existing solutions. The outcome is always surprisingly good.

Collaborate with An Audit Log

Audit Log

cloud-basedcorrelationThe MetaFlows Security System allows organizations to grant access to multiple users for online collaboration in sharing sensor data and intelligence. This is a big advantage because it helps distribute workloads across departments and at different levels of the incident response process. One issue customers brought up was the lack of ability to know who took what action, and at what time they did the action. This is why we added the Account Audit Log feature. You can find this feature under Account -> Account Audit Log. With this new Audit Log, you can track most account actions, including:

  • Changes to contact information and subscription
  • All account access
  • Sensor restarts
  • Creating, changing, or deleting:
    • Sensors
    • Classifications
    • Snort Rules
    • Report Specifications

For every logged action, we track the user, time, and IP address from which these actions originated. We also provide extra details if available.

New Packet Logging and File Carving

carvingPacket Logging and File Carving

Being able to go back and look at the payloads or files transmitted on a network is extremely useful for several reasons:

  1. If you do not have the payload, you cannot really prove malicious intent, and legally you are on the hook.
  2. Payloads/Files are the ultimate forensic tool to decide if a particular incident is a false positive or a true positive.
  3. In more advanced systems payloads can also be used to find false negatives (things should have caused a security event but did not).

Obviously logging all data transferring on a network is challenging because disk space is limited and disks are relatively slow.

The MetaFlows Security System Logging Approach

Our overall approach to overcoming logging limitation is:

  • We store Payloads/Files that are associated with a specific security alert (using the time and the source/destination addresses and ports for identification)
  • When logging proactively (to also see Payloads/Files that do not involve a security alert), keep the disk at 90% utilization or below a certain number of Gigabytes by deleting the older logs.

This scheme gives you certainty of access if there is an incident and a time window to go back in time to look for certain things that might have been overlooked.

Recent Improvements

The Logging and File carving system has been vastly improved by the following:

  1. We now index the packets based on IP addresses using a proprietary approach. Instead of looking for particular packets in a big bucket full of files, the files are divided in smaller buckets each representing a subset of the addresses. This indexing scheme slows down packet logging a bit but makes looking for packets about 200 times faster!
  2. We added the ability to specify user-defined logging policies. Once a policy hits, the logging system prioritizes all packets for the matching policy and stores the Files/payloads in a separate high-priority repository which takes precedence over the normal logging. We will make a separate announcement on the policy specification because it is quite powerful and complex, and requires a dedicated post. For now, the only logging policy is to prioritize any packets involved in high priority events. In the future users will be able to customize more precise ad-hoc policies based on IP addresses, ports, and type of alerts.
The new carving system is backward compatible and automatically converts the existing packet logs stored on the sensor hard drive to the new indexing scheme. This process can take from a few minutes to days depending on your disk size. While this conversion takes place, queries on older logs may not return any data.

Got IPv6?

IPv6 support

???????????????????Many organizations are transitioning to IPv6 because it allows the address space to be managed more easily. One thing is for sure, hackers are on top of it; they are already serving Malware from IPv6-capable servers! It is therefore imperative that all the security software be IPv6 capable in order to avoid glaring security holes.

When it comes to IPv6 most people put the blinders on. Most security policies really just ignore it because it is not main-stream. But you can be sure that whatever is being ignored can be used against you. Ipv6 tunnels are proliferating and usually not monitored at all. They can easily be used to have a data exfiltration super-highway out your network.

The MetaFlows Security System can now work on both IPv4 and IPv6, without gaps in your security.

A Very Expensive Lesson in Malware Protection

The attack on credit card numbers through Target has made many realize that network security, malware, and password protection needs to be taken more seriously. According to the article below, the two major factors in this data breach were 1. undetected malware that was able to scan credit card numbers in the real time, and 2. simple/default passwords that were never updated (especially not in accordance with PCI regulations). Both of these issues have seemingly easy fixes: For the Malware, get something that uses not just signature but behavioral detection and gives analysts real-time analysis (oh hey, what do you know, the MetaFlows Security System does all that, and more!). For the passwords it is a bit trickier. This requires staff training and individual memories the size of elephants in order to remember the hundreds of passwords we use nowadays. But with some staff education on the importance of keeping passwords up to code, and perhaps some mnemonic tricks, the world can be a safer place.

IT Weaknesses Paved the Way for Target Hackers

The Next Layer of Defense is Here!

The researchers from the article below “…expect their findings to be beneficial to enterprises and other organizations in developing the next layer of defense.”

The next layer of defense is already here. The MetaFlows Security System uses behavioral analysis (along with the traditional signature detection) in order to catch even the stealthiest of Malware. It can even catch things that were in the network before it was deployed!
Read the TechNewsWorld article below to find out more about why, regardless of company size, having the most intelligent network protection is key. Then go to www.metaflows.com to find out how to get the most intelligent network protection.

The Never Ending Cycle of Prey and Predator: The Malware War!

Malware is not new and yet ever-evolving. Companies need to strengthen security practices and tools in order to stay ahead, or at the least, stay in the game! With attacks and costs sharing an rising trajectory, information security should be the top of every IT director’s list. Read about it from the perspective of a CSO:

Malware: War without End

Find out how the MetaFlows Security System is keeping steady in the war against Malware and defeating enemies with innovative and cost efficient technology!

And Now For Something Completely Technical: PF Ring 10 Gbps Snort IDS

You can always visit the MetaFlows Website for more information.

PF_RING based 10 Gbps Snort multiprocessing

Tested on CentOS 6 64bit using our custom PF_RING source

PF_RING load balances network traffic originating from an Ethernet interface by hashing the IP headers into N buckets. This allows it to spawn N instances of Snort, each processing a single bucket and achieve higher throughput through multiprocessing. In order to take full advantage of this, you need a multicore processor (like an I7 with 8 processing threads) or a dual or quad processor board that increases parallelism even further across multiple chips.

In a related article we measured the performance of PF_RING with Snort inline at 1 Gbps on an I7 950. The results were impressive.

The big deal is that now you can build low-cost IDPS systems using standard off-the-shelf hardware.

You can purchase our purpose-built Hardware with MetaFlows PF_RING pre-installed, giving you a low cost high performance platform to run your custom PF_RING applications on. If you are interested in learning more, please contact us.

In this article we report on our experiment running Snort on a dual processor board with a total of 24 hyperthreads (using the Intel X5670). Besides measuring Snort processing throughput varying the number of rules, we also (1) changed the compiler used to compile Snort (GCC vs. ICC) and (2) compared PF_RING in NAPI mode (running 24 Snort processes in parallel) and PF_RING Direct NIC Access technology (DNA) (running 16 Snort processes in parallel).

PF_RING NAPI performs the hashing of the packets in software and has a traditional architecture where the packets are copied to user space by the driver. Snort is parallelized using 24 processes that are allowed to float on the 24 hardware threads while the interrupts are parallelized on 16 of the 24 hardware threads.

PF_RING DNA performs the hashing of the packets in hardware (using the Intel 52599 RSS functionality) and relies on 16 hardware queues. The DNA driver allows 16 instances of Snort to read packets directly from the hardware queues therefore virtually eliminating system-level processing overhead. The limitation of DNA is that (1) supports a maximum of 16x parallelism per 10G interface, (2) it only allows 1 process to attach to each hardware queue and (3) it costs a bit of money or requires Silicom cards(well worth it). (2) is significant because it does not allow multiple processes to receive the same data. So, for example if you run “tcpdump -i dna0″, you could not also run “snort -i dna0 -c config.snort -A console” at the same time. The second invocation would return an error.

GCC is the standard open source compiler that comes with CentOS 6 and virtually all other Unix systems. It is the foundation of open source and without it we would still be in the stone age (computationally).

ICC is an Intel proprietary compiler that goes much further in extracting instruction- and data-level parallelism of modern multicore processors such as the i7 and Xeons.

All results are excellent and show that you can build a 5-7 Gbps IDS using standard off-the-shelf machines and PF_RING. The system we used to perform these experiments is below:

The graph above shows the sustained Snort performance of 4 different configurations using a varying number of Emerging Threats Pro rules. As expected, the number of rules has a dramatic effect on performance for all configurations (the more rules, the lower the performance). In all cases, memory access contention is likely to be the main limiting factor.

Given our experience, we think that our setup is fairly representative of an academic institution we have to admit that measuring Snort performance in the absolute is hard. No two networks are the same and rule configurations vary even more widely, nevertheless, the relative performance variations are important and of general interest. You can draw your own conclusions from the above graph; however here are some interesting observations:

  • At the high end (6900 rules) ICC makes a big difference by increasing the throughput by ~1 Gbps (25%)
  • GCC is just as good at maintaining throughput around 5 Gbps
  • PF_RING DNA is always better than PF_RING NAPI.

We describe below how to reproduce these numbers on Linux CentOS 6. If you do not want to go through these steps, we also provide this functionality through our security system (MSS) pre-packaged and ready to go. It would help us if you tried it and let us know what you think.