Cyber Defense Review

Big Data and Cybersecurity

By Dr. Aaron Brantly, SFC Jesse Frigo | September 15, 2015

Cyberspace and cybersecurity contain numerous problems in search of novel approaches able to facilitate dynamic, results driven solution sets. Big Data if examined from a complex, multi-disciplinary perspective offers a range of potential advantages to cyber offense and defense for public and private sector entities ranging from small businesses to the national security community. This post, in brief, highlights the foundations of a research push in its infancy to assess the application of big data for national cybersecurity. While the focus is national cybersecurity writ-large, the lessons to be learned are likely to be impactful to organizations and individuals as the economics and applications of big data for cybersecurity become increasingly affordable.

Big data analysis as a concept is hard to pin down. Generally, it is considered to constitute extremely large observation datasets generated through human or technical means in either structured or unstructured formats. The defining characteristic being that rather than sampling from a population as undertaken in conventional statistical analysis, big data analysis partially infers that the data itself are the population or such a large proportion of the population that mechanisms of analysis are somehow different. Rather than inferring from a sample to a population, the population itself highlights novel insights into some form of action or behavior – machine or human. Moreover, big data is a relatively modern concept. The ability to aggregate, store, process and subsequently analyze the data relies on computational power associated with modern computing devices, i.e. analysis unable to be conducted by hand or with simple observational analysis.

Big data are large, complex and most commonly unstructured. IBM identifies four dimensions associated with big data- Volume (scale of data), Velocity (analysis of streaming data), Variety (different forms of data), Veracity (uncertainty of data).[1] The insights afforded by Big Data are of value when attempting to understand or solve multiple problem sets. Often the data exhaust (the data unneeded for initial analysis) are where the gems for unknown questions reside.[2] Because of its scale and definitive characteristics as broadly encompassing, big data has the potential to facilitate answers for both known problems and unknown problems. Yet big data is not without problems in the form of misinterpretations of noise (i.e. Error) resident within the data.

Big data is useful for multiple applications. First, novel applications of big data can help solve some issues related to cybersecurity issues. In particular, big data can help in identifying anomalous behavior patterns within network traffic or human operators. This occurs through the bolstering of the analytic and machine learning techniques already employed by intrusion detection systems.  Second, the application of big data collected both within cyberspace and as a result of sensors in various areas of operations can provide insight into both the human and technical terrain of a given area. In his recent book Data and Goliath, Bruce Schneier examines how the the cost function associated with storing data has made it possible to not just collect large and diverse volumes of data, but also to store that data efficiently.[3] The data exhaust that every person generates is massive and each click, purchase, facial scan, finger print and much more can help to build innovative tailored information environments for everything from the purchasing of goods and services to the tracking of transnational terrorists. Examples of big data use are growing in ubiquity. Whether it is text analysis using captcha codes[4] or search analysis on flu or dengue fever,[5] big data is present and exploitable.

While big data is present and growing in ubiquity the concerns associated with its use are pervasive. Privacy concerns and big data are being heard. In 2014 the President received a report on privacy and big data from the the Council of Advisors on Science and Technology.[6] The report notes that the pervasiveness of data generation makes traditional notice and consent burdensome to individual users and instead recommends placing that burden with the organization. Yet, here to privacy concerns arise when considering the type of organizations collecting and storing data. The report outlines priorities and indicates a recognition by the U.S. Government of the policy and legal challenges faced by both the public and the private sector with regards to the collection and analysis of data in large volumes. The field of study is growing, yet the impact of the use of big data on the public consciousness and discourse is real and persistent.[7]

Yet despite a recognition that big data challenges privacy, its ability to affect positive change might revolutionize aspects of cybersecurity and military operations to reduce costs and increase efficiencies for both the cyber warrior and the boots on the ground soldier. Below is just one of the potential application already being worked on by multiple actors, public and private, but that offers potential benefits to national security.

A Smart IDS

Cybersecurity failures are not solely technical or human problems. Instead, cybersecurity failures run the gamut from simple errors to a complex amalgam of human and computer interaction that results undesirable outcomes. There is little doubt that when functioning in the desired fashion computer and human interactions can generate positive net benefits. Yet, whether it is physical, logical, or human error – either intentionally or unintentionally induced, the complexity of the problems can be overwhelming in insolation.

Dumb (constrained data-stream anomaly-based or signature-based) IDS that operate independent of data from other aspects of an organization can collect, store and detect anomalous traffic patterns resident within a network.[8] These systems can offer extreme power and systemic security, yet as the needs, uses and goals of an organization change, their ability to rapidly adapt are limited to their purview of collection. Conventional IDS is a form of big data analysis, leveraging two, perhaps three of the “V’s” of Big data identified by IBM. IDS constitutes large volume and velocity (i.e. real-time streaming), but the variety and veracity of data are limited in scope. By informing IDS with project, human resources, market, weather, political and data from multiple disciplines with direct relevance to understanding the volume, type, origin and destination of data it is possible to move from dumb (i.e. constrained lens) to smart (i.e. multi-lens) dynamic analytic processes. Network security that moves beyond the network and incorporates and reimagines the way in which data are used by examining the causal mechanisms of failures associated with anomalous behavior identified in traditional IDS models are complex and require modifications to existing or the creation of entirely new cluster computing algorithms, the development of new visualization strategies to convey complex information to operators, legal and privacy considerations to avoid illegal collection and policy frameworks to facilitate reasoned rule based approaches to the collection and analysis of information generated.

Multi-disciplinary Big Data

This blog is meant to hint at the numerous potential research agendas available for study within emerging branch of analytics known as big data. Big data applications are not isolated to one use case and the potential applications are limited only by the creativity of those wishing to utilize it for innovative solutions. By examining and leveraging big data from social (i.e: behavioral/cognitive, legal, policy, cultural, historical, geographical) and from technical (computer science, engineering, physics) perspectives the resultant utility of the research is likely to incorporate multiple perspectives and result in more useful products and applications.



[1] “The Four V’s of Big Data.” Accessed September 9, 2015.

[2] Mayer-Schönberger, Viktor, and Kenneth Cukier. 2013. Big data: a revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.

[3] Schneier, Bruce. 2015. Data and Goliath: the hidden battles to collect your data and control your world. New York: New York W.W. Norton & Company.

[4] See: Agarwal, Shivam, “Utilizing Big Data in Identification and Correction of OCR Errors”.

[5] See:

[6] See:

[7] Brantly, Aaron F. “The Changed Conversation About Surveillance Online”.

[8] See: for a basic explanation of IDS.

US Army Comments Policy
If you wish to comment, use the text box below. Army reserves the right to modify this policy at any time.

This is a moderated forum. That means all comments will be reviewed before posting. In addition, we expect that participants will treat each other, as well as our agency and our employees, with respect. We will not post comments that contain abusive or vulgar language, spam, hate speech, personal attacks, violate EEO policy, are offensive to other or similar content. We will not post comments that are spam, are clearly "off topic", promote services or products, infringe copyright protected material, or contain any links that don't contribute to the discussion. Comments that make unsupported accusations will also not be posted. The Army and the Army alone will make a determination as to which comments will be posted. Any references to commercial entities, products, services, or other non-governmental organizations or individuals that remain on the site are provided solely for the information of individuals using this page. These references are not intended to reflect the opinion of the Army, DoD, the United States, or its officers or employees concerning the significance, priority, or importance to be given the referenced entity, product, service, or organization. Such references are not an official or personal endorsement of any product, person, or service, and may not be quoted or reproduced for the purpose of stating or implying Army endorsement or approval of any product, person, or service.

Any comments that report criminal activity including: suicidal behaviour or sexual assault will be reported to appropriate authorities including OSI. This forum is not:

  • This forum is not to be used to report criminal activity. If you have information for law enforcement, please contact OSI or your local police agency.
  • Do not submit unsolicited proposals, or other business ideas or inquiries to this forum. This site is not to be used for contracting or commercial business.
  • This forum may not be used for the submission of any claim, demand, informal or formal complaint, or any other form of legal and/or administrative notice or process, or for the exhaustion of any legal and/or administrative remedy.

Army does not guarantee or warrant that any information posted by individuals on this forum is correct, and disclaims any liability for any loss or damage resulting from reliance on any such information. Army may not be able to verify, does not warrant or guarantee, and assumes no liability for anything posted on this website by any other person. Army does not endorse, support or otherwise promote any private or commercial entity or the information, products or services contained on those websites that may be reached through links on our website.

Members of the media are asked to send questions to the public affairs through their normal channels and to refrain from submitting questions here as comments. Reporter questions will not be posted. We recognize that the Web is a 24/7 medium, and your comments are welcome at any time. However, given the need to manage federal resources, moderating and posting of comments will occur during regular business hours Monday through Friday. Comments submitted after hours or on weekends will be read and posted as early as possible; in most cases, this means the next business day.

For the benefit of robust discussion, we ask that comments remain "on-topic." This means that comments will be posted only as it relates to the topic that is being discussed within the blog post. The views expressed on the site by non-federal commentators do not necessarily reflect the official views of the Army or the Federal Government.

To protect your own privacy and the privacy of others, please do not include personally identifiable information, such as name, Social Security number, DoD ID number, OSI Case number, phone numbers or email addresses in the body of your comment. If you do voluntarily include personally identifiable information in your comment, such as your name, that comment may or may not be posted on the page. If your comment is posted, your name will not be redacted or removed. In no circumstances will comments be posted that contain Social Security numbers, DoD ID numbers, OSI case numbers, addresses, email address or phone numbers. The default for the posting of comments is "anonymous", but if you opt not to, any information, including your login name, may be displayed on our site.

Thank you for taking the time to read this comment policy. We encourage your participation in our discussion and look forward to an active exchange of ideas.