Information Security Analytics || Security and Text Mining

作者:Talabis; Mark Ryan M. 刊名: 上传者:王为善



Information Security Analytics. Copyright ? 2015 Elsevier Inc. All rights reserved. 123 CHAPTER 6 SCENARIOS AND CHALLENGES IN SECURITY ANALYTICS WITH TEXT MINING Massive amounts of unstructured data are being collected from online sources, such as e-mails, call center transcripts, wikis, online bulletin boards, blogs, tweets, Web pages, and so on. Also, as noted in a previous chapter, large amounts of data are also being collected in semistructured form, such as log files containing information from servers and networks. Semistructured data sets are not quite as free form as the body of an e-mail, but are not as rigidly structured as tables and columns in a relational database. Text mining analysis is useful for both unstructured and semistructured textual data. There are many ways that text mining can be used for security analytics. E-mails can be analyzed to discover patterns in words and phrases, which may indicate a phishing attack. Call center recordings can be converted to text, which then can be analyzed to find patterns and phrases, which may indicate attempts to use a stolen identity, gain passwords to secure systems, or commit other fraudulent acts. Web sites can be scraped and analyzed to find trends in security-related themes, such as the latest botnet threats, malware, and other Internet hazards. There has been a proliferation of new tools available to deal with the challenge of analyzing unstruc