Big data is around us. However, it is common to hear from a lot of data scientists and researchers doing analytics that they need more data. How is that possible, and where does this eagerness to get more data come from?

Very often, data scientists need lots of data to train sophisticated machine-learning models. The same applies when using machine-learning algorithms for cybersecurity. Lots of data is needed in order to build classifiers that identify, among many different targets, malicious behavior and malware infections. In this context, the eagerness to get vast amounts of data comes from the need to have enough positive samples — such as data from real threats and malware infections — that can be used to train machine-learning classifiers.

To read this article in full or to leave a comment, please click here