Dataset Replication Possibilities
Evaluation Framework For IDS Datasets
Notes for "An Evaluation Framework for Instruction Detection Dataset"
(Amirhossein Gharib, Iman Sharafaldin, et al.)
Options to get data for evaluation of defence techniques:
- Replaying publicly available dataset from attacks.
- Generating traffic: not great for simulating attack, Curl-Loader is an
open-source tool to generate artificial traffic.
Scott et al. presented three major criteria in dataset:
- redundancy
- inherent unpredictability
- complexity or multivariate dependencies.
Shiravi et al. defined evaluation criteria with six aspects:
- realistic network
- realistic traffic
- labeled dataset
- total interaction capture
- complete capture
- diversity of attacks.
The eleven features defined by the framework
-
Complete Network Configuration: Several attack only reveal themselves with
a complete configuration from computers, servers, routers, firewalls. So a
realistic configuration is necessary to capture the real effects of attacks.
-
Complete Traffic: Sequence of packets from source that can be a host,
router, or switch to a destination which may be another host, a multicast
group, or a broadcast domain.
-
Labeled Dataset: Tagging and labeling the data.
-
Complete Interactions: Having all information about network interactions
such between internal LANs.
-
Complete Capture: Shouldn't remove traffic which is non-functional or not
labeled since it is important in calculation of false-positive percentage of
an IDS system.
-
Available Protocols: Interactive traffic includes session that consist of
short request and response pairs such as applications involving real-time
interactions with users. Should consist of both latency sensitive and
non-real-time data.
-
Attack Diversity: Almost self-explanatory - must contain a variety of
attacks.
-
Anonymity: Privacy compromising issues occurs when both the IP and
payload are available. Removing payload decreases the usefulness of the
dataset for systems like deep packet inspection (DPI).
-
Heterogeneity: Different sources of information from things like
operating system logs, network equipment logs, network traffic etc.
-
Feature Set: Extract different features from different data sources such
as logs and traffic using feature extraction applications.
-
Metadata: Include proper documentation about configuration, systems,
attack scenarios, and other vital information.
Generating Reliable Dataset
Benign Profile (B-Profile)
This methods is used to generate benign background traffic, B-Profile
is
designed to extract the abstract behaviour of a group of human users. The method
uses machine learning models and statistical analysis techniques to capture the
abstract features.
The encapsulated features are distributions of:
- packet sizes of a protocol
- number of packets per flow
- certain patterns in the payload
- size of payload
- request time distribution of protocols.
There are two steps for creating benign profiles:
-
Individual Profiling: A rich dataset should contain events from HTTP,
HTTPS, FTP, SSH, and email protocols. These can be captured using Man In The
Middle (MITM), network sniffing, browser and email histories.
-
Clustering: In the clustering step, individual user profiles are analyzed
against other users to create clusters of users with similar behaviour and
distributions. The authors found best results with XMeans
algorithm using
the distance algorithm of Dynamic Time Warping (DTW
) for measuring
similarity between two given time-dependent sequences.
To generate traffic, a random B-Profile is selected and a slightly modified
web-crawling mechanism is devised to demonstrate the browsing behaviour of users
for HTTP and HTTPS.

Attack Profiles (M-Profile)
TODO: Will describe later
CIC-IDS2017
This is an overview of the creation process of the CIC-IDS2017 dataset. This
comes from the paper "Towards Generating a New Intrusion Detection Dataset and
Intrusion Traffic Characterization" by Iman Sharafaldin, Arash Habibi Lashkari,
and Ali A. Ghorbani. I also use the details mentioned on the web page where the
dataset is available: https://www.unb.ca/cic/datasets/ids-2017.html.
The dataset captures network data in the form of PCAP
s and then it performs
network traffic analysis using the CICFlowMeter tool to create labelled flows
based on time stamp, source, and destination IPs, source and destination ports,
protocols and attacks.
This dataset claims to create realistic background traffic using a proposed
B-Profile system to profile the abstract behaviour of human interactions and
generate naturalistic benign background traffic.
Overview From Web Page
The dataset generation included 5 days of data collection:
- Day 1: Only benign traffic.
- Day 2: Brute-force, FTP-patator, SSH-Patator
- Day 3: DoS/DDos, DoS slowloris, DoS Slowhttptest, DoS Hulk, DoS GoldenEye,
Heartbleed.
- Day 4: Web Attack Brute force, web attack XSS, web attack SQL injection,
Infiltration by dropbox download, infiltration by cool disk (Mac).
- Day 5: Botnet ARES, Port Scan, DDoS LOIT.
There were 14 victim machines (1 web server, 1 ubuntu server, 4 ubuntu machines,
5 windows machines, and 1 Mac machine) being attacked by 4 attacker machines (1
kali linux and 3 windows machines).
Overview From Paper
The authors claims good datasets do not exist to evaluate the performance of IDS
techniques. Even the ones that become available are very heavily anonymized and
do not reflect any current trends.
The paper contributes in creating a new dataset which covers all eleven
necessary criteria with common updated attacks and the rest of the paper
analyzes the remaining datasets and their own dataset.
They authors extracts 80 traffic features from the dataset using the
CICFlowMeter tool.
Experiment
Two networks: the attack network and victim network. The victim network consists
of a firewall, router, switches, and most common operating systems along with an
agent that provides the benign behaviour on each machine. The attack network is
a completely separate network with its own router and switch and machines with
public IPs.
Benign Profile Agent (B-Profile)
This dataset uses the proposed B-Profile system from the paper "Towards a
Reliable Intrusion Detection Benchmark Dataset" (Sharafaldin et al., 2017) which
profiles human interactions and "generates naturalistic benign background
traffic".
The profile is generated from 25 users based on HTTP, HTTPS, FTP, SSH, and email
protocols.
Attack Vectors
The dataset uses the following attack vectors to attack the different machines:
- Brute Force Attacks
- Heartbleed Attack
- Botnet
- DoS Attacks
- DDoS Attacks
- Web Attacks
- Infiltration Attacks
CIC-IDS2018 On AWS
-
Dataset based on creation of user profiles which contain abstract
representation of events and behaviours seen on the network.
-
Attack types: Brute-force, Heartbleed, Botnet, DoS, DDos, Web attacks, and
infiltration of network from inside.
-
Attacking infrastructure includes 50 machines and the victim organization has
5 departments and includes 420 machines and 30 servers.
-
80 features extracted from the captured traffic using CICFlowMeter-V3
Source
Attack Approaches
-
Infiltration of network from inside: malicious file then backdoor executed,
scanning internal network for other vulnerable machines.
-
HTTP DoS: Uses Slowloris
, LOIC
, and HOIC
. Exploits open TCP connections
sending valid but incomplete HTTP requests.
-
Web Attacks: Damn Vulnerable Web App (DVWA
) - attacks on vulnerabilities on
website - SQL injection, command injection, and unrestricted file upload.
-
Brute force attacks: weak username and password combinations - final goal of
acquiring an SSH and MySQL account by running a dictionary brute force attack
against the main server.
-
Last updated attacks: Attacks that are from famous vulnerabilities that can be
conducted during a specific amount of time - sometimes affecting millions of
computers taking time to patch these Heartbleed is one such attack (2018).
Benign Traffic
B-profile is designed to extract the abstract behaviour of a group of human
users. This tried to encapsulate network events produced by users with machine
learning and statistical analysis techniques.
Once B-Profiles are derived from users, an agent (CIC-BenignGenerator
) or a
human operator can use them to generate realistic benign events on a network.
Process
For each data raw data was recorded including the network traffic (Pcaps) and
event logs (windows and ubuntu event logs) per machine.
Problems With CIC-IDS2017 and CIC-IDS2018
The paper "Error prevalence in NIDS datasets: A case study on CIC-IDS-2017 and
CSE-CIC-IDS-2018" by Lisa Liu et al. details multiple problems with the
datasets.
Missed Attacks
Various malicious flows are missed in the original dataset. The dataset is
"severely imbalanced" in favor of benign traffic. These are caused by various
factors such as imprecise time frame accounts, to incorrect attack assignments.
The paper publishes a table of all the attacks missed by the original dataset:

Mislabelling
Here is an overview of flows that were mis-labelled by the original datasets:
-
empty payload (no traffic just tcp start and finish) labelled at malicious.
-
port/system closed (malicious traffic sent to a system that is down or
unavailable.)
-
attack startup/teardown artifacts (parts of attack traffic that aren't
distinguishable from regular traffic). Example, some attacks require loading
the front page before the start of the attack - due to absence of malicious
payload in this phase, they appear semantically identical to a benign user
browsing the web-app, within the context of a single flow.
- Could this be used as a way to detect this form of attacks?
- Maybe the startup/teardown sections need to be marked as benign even
though they may be correlated to a malicious attack.
-
no malicious payload this is a case where the payload exists but due to
the flow timeout set by CICFlowMeter, the first flow contains all the
malicious content and the latter does not contain any malicious content.
-
attack artifact: regular traffic between attacker and victim unrelated to
the attack. This is marked malicious.
-
target system unresponsive: the target is unresponsive potentially due to
being overwhelmed.
-
time-based labelling: without accurate host and port filtering, time-based
labelling leads to traffic not involved in the attack being marked malicious.
-
ambiguous class labels: if you remove the flow id, source ip, source port,
destination ip and timestamp then there are duplicate rows which are labelled
differently.
There are number of issues/changes with the CICFlowMeter tool that affect the
flow integrity and attack characterization.
-
packet time-stamping issue: Sometime the SYN ACK
packets arrive before
the SYN
packets from the attacker. This is probability due to the operating
system being responsible for time-stamping. Therefore, dataset creators should
verify both directions of traffic when labelling flow.
-
TCP segmentation offset: TCP segmentation offloading (TSO) leads to IP
length of 0 in the header. Since packet headers are used to determind how
packets are dissected, these are put into a different flow. This affects
attacks with large payloads.
-
attributes: some new attributes were added to better handle flows. The
paper lists four new attributes added to the CICFlowMeter tool.
Impact on Training
The authors test their new cleaned dataset against existing models to see the
differences between the datasets and to understand if the models were
over-fitting to the existing data.
Automated Detection Of Labelling Errors
The authors use their manually corrected dataset as a ground-truth which they
use to automate detection of labelling errors. The use Confident Learning
and O2U-Net that have been used to detect labelling errors in the field of
computing vision.