All publications by year – Frank Breitinger

93 entries « ‹ 2 of 2 › »

51.	Clark, Devon R.; Meffert, Christopher; Baggili, Ibrahim; Breitinger, Frank DROP (DRone Open source Parser) your drone: Forensic analysis of the DJI Phantom III (Journal Article) In: Digital Investigation, vol. 22, Supplement, pp. S3 - S14, 2017, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{CMBB17, title = {DROP (DRone Open source Parser) your drone: Forensic analysis of the DJI Phantom III}, author = {Devon R. Clark and Christopher Meffert and Ibrahim Baggili and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S1742287617302001}, doi = {10.1016/j.diin.2017.06.013}, issn = {1742-2876}, year = {2017}, date = {2017-08-05}, journal = {Digital Investigation}, volume = {22, Supplement}, pages = {S3 - S14}, abstract = {Abstract The DJI Phantom III drone has already been used for malicious activities (to drop bombs, remote surveillance and plane watching) in 2016 and 2017. At the time of writing, DJI was the drone manufacturer with the largest market share. Our work presents the primary thorough forensic analysis of the DJI Phantom III drone, and the primary account for proprietary file structures stored by the examined drone. It also presents the forensically sound open source tool DRone Open source Parser (DROP) that parses proprietary DAT files extracted from the drone's nonvolatile internal storage. These DAT files are encrypted and encoded. The work also shares preliminary findings on TXT files, which are also proprietary, encrypted, encoded, files found on the mobile device controlling the drone. These files provided a slew of data such as GPS locations, battery, flight time, etc. By extracting data from the controlling mobile device, and the drone, we were able to correlate data and link the user to a specific device based on extracted metadata. Furthermore, results showed that the best mechanism to forensically acquire data from the tested drone is to manually extract the SD card by disassembling the drone. Our findings illustrated that the drone should not be turned on as turning it on changes data on the drone by creating a new DAT file, but may also delete stored data if the drone's internal storage is full.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract The DJI Phantom III drone has already been used for malicious activities (to drop bombs, remote surveillance and plane watching) in 2016 and 2017. At the time of writing, DJI was the drone manufacturer with the largest market share. Our work presents the primary thorough forensic analysis of the DJI Phantom III drone, and the primary account for proprietary file structures stored by the examined drone. It also presents the forensically sound open source tool DRone Open source Parser (DROP) that parses proprietary DAT files extracted from the drone's nonvolatile internal storage. These DAT files are encrypted and encoded. The work also shares preliminary findings on TXT files, which are also proprietary, encrypted, encoded, files found on the mobile device controlling the drone. These files provided a slew of data such as GPS locations, battery, flight time, etc. By extracting data from the controlling mobile device, and the drone, we were able to correlate data and link the user to a specific device based on extracted metadata. Furthermore, results showed that the best mechanism to forensically acquire data from the tested drone is to manually extract the SD card by disassembling the drone. Our findings illustrated that the drone should not be turned on as turning it on changes data on the drone by creating a new DAT file, but may also delete stored data if the drone's internal storage is full. Close http://www.sciencedirect.com/science/article/pii/S1742287617302001 doi:10.1016/j.diin.2017.06.013 Close
52.	Zhang, Xiaolu; Baggili, Ibrahim; Breitinger, Frank Breaking into the vault: privacy, security and forensic analysis of android vault applications (Journal Article) In: Computers & Security, vol. 70, pp. 516 - 531, 2017, ISSN: 0167-4048. (Abstract \| Links \| BibTeX) @article{ZBB17, title = {Breaking into the vault: privacy, security and forensic analysis of android vault applications}, author = {Xiaolu Zhang and Ibrahim Baggili and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S0167404817301529}, doi = {10.1016/j.cose.2017.07.011}, issn = {0167-4048}, year = {2017}, date = {2017-08-02}, journal = {Computers & Security}, volume = {70}, pages = {516 - 531}, abstract = {Abstract In this work we share the first account for the forensic analysis, security and privacy of Android vault applications. Vaults are designed to be privacy enhancing as they allow users to hide personal data but may also be misused to hide incriminating files. Our work has already helped law enforcement in the state of Connecticut to reconstruct 66 incriminating images and 18 videos in a single criminal case. We present case studies and results from analyzing 18 Android vault applications (accounting for nearly 220 million downloads from the Google Play store) by reverse engineering them and examining the forensic artifacts they produce. Our results showed that Image 1 obfuscated their code and Image 2 applications used native libraries hindering the reverse engineering process of these applications. However, we still recovered data from the applications without root access to the Android device as we were able to ascertain hidden data on the device without rooting for Image 3 of the applications. Image 4 of the vault applications were found to not encrypt photos they stored, and Image 5 were found to not encrypt videos. Image 6 of the applications were found to store passwords in cleartext. We were able to also implement a swap attack on Image 7 applications where we achieved unauthorized access to the data by swapping the files that contained the password with a self-created one. In some cases, our findings illustrate unfavorable security implementations of privacy enhancing applications, but also showcase practical mechanisms for investigators to gain access to data of evidentiary value. In essence, we broke into the vaults.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract In this work we share the first account for the forensic analysis, security and privacy of Android vault applications. Vaults are designed to be privacy enhancing as they allow users to hide personal data but may also be misused to hide incriminating files. Our work has already helped law enforcement in the state of Connecticut to reconstruct 66 incriminating images and 18 videos in a single criminal case. We present case studies and results from analyzing 18 Android vault applications (accounting for nearly 220 million downloads from the Google Play store) by reverse engineering them and examining the forensic artifacts they produce. Our results showed that Image 1 obfuscated their code and Image 2 applications used native libraries hindering the reverse engineering process of these applications. However, we still recovered data from the applications without root access to the Android device as we were able to ascertain hidden data on the device without rooting for Image 3 of the applications. Image 4 of the vault applications were found to not encrypt photos they stored, and Image 5 were found to not encrypt videos. Image 6 of the applications were found to store passwords in cleartext. We were able to also implement a swap attack on Image 7 applications where we achieved unauthorized access to the data by swapping the files that contained the password with a self-created one. In some cases, our findings illustrate unfavorable security implementations of privacy enhancing applications, but also showcase practical mechanisms for investigators to gain access to data of evidentiary value. In essence, we broke into the vaults. Close http://www.sciencedirect.com/science/article/pii/S0167404817301529 doi:10.1016/j.cose.2017.07.011 Close
53.	Moore, Jason; Baggili, Ibrahim; Breitinger, Frank Find Me If You Can: Mobile GPS Mapping Applications Forensics Analysis & SNAVP The Open Source, Modular, Extensible Parser (Journal Article) In: Journal of Digital Forensics, Security and Law (JDFSL), vol. 12, no. 1, pp. 7, 2017. (Links \| BibTeX) @article{MBB17, title = {Find Me If You Can: Mobile GPS Mapping Applications Forensics Analysis & SNAVP The Open Source, Modular, Extensible Parser}, author = {Jason Moore and Ibrahim Baggili and Frank Breitinger}, url = {https://doi.org/10.15394/jdfsl.2017.1414}, doi = {10.15394/jdfsl.2017.1414}, year = {2017}, date = {2017-06-13}, journal = {Journal of Digital Forensics, Security and Law (JDFSL)}, volume = {12}, number = {1}, pages = {7}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close https://doi.org/10.15394/jdfsl.2017.1414 doi:10.15394/jdfsl.2017.1414 Close
54.	Harichandran, Vikram S.; Breitinger, Frank; Baggili, Ibrahim Bytewise Approximate Matching: The Good, The Bad, and The Unknown (Journal Article) In: Journal of Digital Forensics, Security and Law, vol. 11, no. 2, pp. 59–78, 2016. (Abstract \| Links \| BibTeX) @article{HBB16, title = {Bytewise Approximate Matching: The Good, The Bad, and The Unknown}, author = {Vikram S. Harichandran and Frank Breitinger and Ibrahim Baggili}, doi = {10.15394/jdfsl.2016.1379}, year = {2016}, date = {2016-12-26}, journal = {Journal of Digital Forensics, Security and Law}, volume = {11}, number = {2}, pages = {59–78}, abstract = {Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Hash functions are established and well-known in digital forensics, where they are commonly used for proving integrity and file identification (i.e., hash all files on a seized device and compare the fingerprints against a reference database). However, with respect to the latter operation, an active adversary can easily overcome this approach because traditional hashes are designed to be sensitive to altering an input; output will significantly change if a single bit is flipped. Therefore, researchers developed approximate matching, which is a rather new, less prominent area but was conceived as a more robust counterpart to traditional hashing. Since the conception of approximate matching, the community has constructed numerous algorithms, extensions, and additional applications for this technology, and are still working on novel concepts to improve the status quo. In this survey article, we conduct a high-level review of the existing literature from a non-technical perspective and summarize the existing body of knowledge in approximate matching, with special focus on bytewise algorithms. Our contribution allows researchers and practitioners to receive an overview of the state of the art of approximate matching so that they may understand the capabilities and challenges of the field. Simply, we present the terminology, use cases, classification, requirements, testing methods, algorithms, applications, and a list of primary and secondary literature. Close doi:10.15394/jdfsl.2016.1379 Close
55.	Jeong, Doowon; Breitinger, Frank; Kang, Hari; Lee, Sangjin Towards Syntactic Approximate Matching-A Pre-Processing Experiment (Journal Article) In: The Journal of Digital Forensics, Security and Law: JDFSL, vol. 11, no. 2, pp. 97–110, 2016. (Abstract \| Links \| BibTeX) @article{jeong2016towards, title = {Towards Syntactic Approximate Matching-A Pre-Processing Experiment}, author = {Doowon Jeong and Frank Breitinger and Hari Kang and Sangjin Lee}, url = {https://doi.org/10.15394/jdfsl.2016.1381}, doi = {10.15394/jdfsl.2016.1381}, year = {2016}, date = {2016-12-26}, journal = {The Journal of Digital Forensics, Security and Law: JDFSL}, volume = {11}, number = {2}, pages = {97–110}, publisher = {Association of Digital Forensics, Security and Law}, abstract = {Over the past few years, the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested, and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a file) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper, we show that by using simple pre-processing, we significantly can influence the outcome. Although our test set is based on text-based file types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show that it can be beneficial to focus on the content of files only (depending on the use-case). While for this experiment we utilized text files, Additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which files are similar and how).}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Over the past few years, the popularity of approximate matching algorithms (a.k.a. fuzzy hashing) has increased. Especially within the area of bytewise approximate matching, several algorithms were published, tested, and improved. It has been shown that these algorithms are powerful, however they are sometimes too precise for real world investigations. That is, even very small commonalities (e.g., in the header of a file) can cause a match. While this is a desired property, it may also lead to unwanted results. In this paper, we show that by using simple pre-processing, we significantly can influence the outcome. Although our test set is based on text-based file types (cause of an easy processing), this technique can be used for other, well-documented types as well. Our results show that it can be beneficial to focus on the content of files only (depending on the use-case). While for this experiment we utilized text files, Additionally, we present a small, self-created dataset that can be used in the future for approximate matching algorithms since it is labeled (we know which files are similar and how). Close https://doi.org/10.15394/jdfsl.2016.1381 doi:10.15394/jdfsl.2016.1381 Close
56.	Al-khateeb, Samer; Conlan, Kevin J.; Baggili, Nitin Agarwal And Ibrahim; Breitinger, Frank Exploring Deviant Hacker Networks (DHN) On Social Media Platforms (Journal Article) In: Journal of Digital Forensics, Security and Law, vol. 11, no. 2, pp. 7–20, 2016. (Abstract \| Links \| BibTeX) @article{SCA16, title = {Exploring Deviant Hacker Networks (DHN) On Social Media Platforms}, author = {Samer Al-khateeb and Kevin J. Conlan and Nitin Agarwal And Ibrahim Baggili and Frank Breitinger}, doi = {10.15394/jdfsl.2016.1375}, year = {2016}, date = {2016-12-26}, journal = {Journal of Digital Forensics, Security and Law}, volume = {11}, number = {2}, pages = {7–20}, abstract = {Online Social Networks (OSNs) have grown exponentially over the past decade. The initial use of social media for benign purposes (e.g., to socialize with friends, browse pictures and photographs, and communicate with family members overseas) has now transitioned to include malicious activities (e.g., cybercrime, cyberterrorism, and cyberwarfare). These nefarious uses of OSNs poses a significant threat to society, and thus requires research attention. In this exploratory work, we study activities of one deviant groups: hacker groups on social media, which we term Deviant Hacker Networks (DHN). We investigated the connection between different DHNs on Twitter: how they are connected, identified the powerful nodes, which nodes sourced information, and which nodes act as "bridges" between different network components. From this, we were able to identify and articulate specific examples of DHNs communicating with each other, with the goal of committing some form of deviant act online. In our work, we also attempted to bridge the gap between the empirical study of OSNs and cyber forensics, as the growth of OSNs is now bringing these two domains together, due to OSNs continuously generating vast amounts of evidentiary data.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Online Social Networks (OSNs) have grown exponentially over the past decade. The initial use of social media for benign purposes (e.g., to socialize with friends, browse pictures and photographs, and communicate with family members overseas) has now transitioned to include malicious activities (e.g., cybercrime, cyberterrorism, and cyberwarfare). These nefarious uses of OSNs poses a significant threat to society, and thus requires research attention. In this exploratory work, we study activities of one deviant groups: hacker groups on social media, which we term Deviant Hacker Networks (DHN). We investigated the connection between different DHNs on Twitter: how they are connected, identified the powerful nodes, which nodes sourced information, and which nodes act as "bridges" between different network components. From this, we were able to identify and articulate specific examples of DHNs communicating with each other, with the goal of committing some form of deviant act online. In our work, we also attempted to bridge the gap between the empirical study of OSNs and cyber forensics, as the growth of OSNs is now bringing these two domains together, due to OSNs continuously generating vast amounts of evidentiary data. Close doi:10.15394/jdfsl.2016.1375 Close
57.	Ricci, Joseph; Baggili, Ibrahim; Breitinger, Frank Watch What You Wear: Smartwatches and Sluggish Security (Book Section) In: Marrington, Andrew; Kerr, Don; Gammack, John (Ed.): Managing Security Issues and the Hidden Dangers of Wearable Technologies, pp. 47, IGI Global, 2016. (Abstract \| Links \| BibTeX) @incollection{RBB16, title = {Watch What You Wear: Smartwatches and Sluggish Security}, author = {Joseph Ricci and Ibrahim Baggili and Frank Breitinger}, editor = {Andrew Marrington and Don Kerr and John Gammack}, doi = {10.4018/978-1-5225-1016-1.ch003}, year = {2016}, date = {2016-09-01}, booktitle = {Managing Security Issues and the Hidden Dangers of Wearable Technologies}, journal = {Managing Security Issues and the Hidden Dangers of Wearable Technologies}, pages = {47}, publisher = {IGI Global}, abstract = {There is no doubt that the form factor of devices continues to shrink as evidenced by smartphones and most recently smartwatches. The adoption rate of small computing devices is staggering and needs stronger attention from the cybersecurity and digital forensics communities. In this chapter, we dissect smartwatches. We first present a historical roadmap of smartwatches. We then explore the smartwatch marketplace and outline existing smartwatch hardware, operating systems and software. Next we elaborate on the uses of smartwatches and then discuss the security and forensic implications of smartwatches by reviewing the relevant literature. Lastly, we outline future research directions in smartwatch security and forensics.}, keywords = {}, pubstate = {published}, tppubtype = {incollection} } Close There is no doubt that the form factor of devices continues to shrink as evidenced by smartphones and most recently smartwatches. The adoption rate of small computing devices is staggering and needs stronger attention from the cybersecurity and digital forensics communities. In this chapter, we dissect smartwatches. We first present a historical roadmap of smartwatches. We then explore the smartwatch marketplace and outline existing smartwatch hardware, operating systems and software. Next we elaborate on the uses of smartwatches and then discuss the security and forensic implications of smartwatches by reviewing the relevant literature. Lastly, we outline future research directions in smartwatch security and forensics. Close doi:10.4018/978-1-5225-1016-1.ch003 Close
58.	Meffert, Christopher S.; Baggili, Ibrahim; Breitinger, Frank Deleting collected digital evidence by exploiting a widely adopted hardware write blocker (Journal Article) In: Digital Investigation, vol. 18, pp. 87–96, 2016, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{MBB16, title = {Deleting collected digital evidence by exploiting a widely adopted hardware write blocker}, author = {Christopher S. Meffert and Ibrahim Baggili and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S1742287616300354}, doi = {10.1016/j.diin.2016.04.004}, issn = {1742-2876}, year = {2016}, date = {2016-08-07}, journal = {Digital Investigation}, volume = {18}, pages = {87–96}, abstract = {In this primary work we call for the importance of integrating security testing into the process of testing digital forensic tools. We postulate that digital forensic tools are increasing in features (such as network imaging), becoming networkable, and are being proposed as forensic cloud services. This raises the need for testing the security of these tools, especially since digital evidence integrity is of paramount importance. At the time of conducting this work, little to no published anti-forensic research had focused on attacks against the forensic tools/process. We used the TD3, a popular, validated, touch screen disk duplicator and hardware write blocker with networking capabilities and designed an attack that corrupted the integrity of the destination drive (drive with the duplicated evidence) without the user's knowledge. By also modifying and repackaging the firmware update, we illustrated that a potential adversary is capable of leveraging a phishing attack scenario in order to fake digital forensic practitioners into updating the device with a malicious operating system. The same attack scenario may also be practiced by a disgruntled insider. The results also raise the question of whether security standards should be drafted and adopted by digital forensic tool makers.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close In this primary work we call for the importance of integrating security testing into the process of testing digital forensic tools. We postulate that digital forensic tools are increasing in features (such as network imaging), becoming networkable, and are being proposed as forensic cloud services. This raises the need for testing the security of these tools, especially since digital evidence integrity is of paramount importance. At the time of conducting this work, little to no published anti-forensic research had focused on attacks against the forensic tools/process. We used the TD3, a popular, validated, touch screen disk duplicator and hardware write blocker with networking capabilities and designed an attack that corrupted the integrity of the destination drive (drive with the duplicated evidence) without the user's knowledge. By also modifying and repackaging the firmware update, we illustrated that a potential adversary is capable of leveraging a phishing attack scenario in order to fake digital forensic practitioners into updating the device with a malicious operating system. The same attack scenario may also be practiced by a disgruntled insider. The results also raise the question of whether security standards should be drafted and adopted by digital forensic tool makers. Close http://www.sciencedirect.com/science/article/pii/S1742287616300354 doi:10.1016/j.diin.2016.04.004 Close
59.	Harichandran, Vikram S.; Walnycky, Daniel; Baggili, Ibrahim; Breitinger, Frank CuFA: A more formal definition for digital forensic artifacts (Journal Article) In: Digital Investigation, vol. 18, pp. 125–137, 2016, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{HWB16, title = {CuFA: A more formal definition for digital forensic artifacts}, author = {Vikram S. Harichandran and Daniel Walnycky and Ibrahim Baggili and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S1742287616300366}, doi = {10.1016/j.diin.2016.04.005}, issn = {1742-2876}, year = {2016}, date = {2016-08-07}, journal = {Digital Investigation}, volume = {18}, pages = {125–137}, abstract = {The term ``artifact'' currently does not have a formal definition within the domain of cyber/digital forensics, resulting in a lack of standardized reporting, linguistic understanding between professionals, and efficiency. In this paper we propose a new definition based on a survey we conducted, literature usage, prior definitions of the word itself, and similarities with archival science. This definition includes required fields that all artifacts must have and encompasses the notion of curation. Thus, we propose using a new term – curated forensic artifact (CuFA) – to address items which have been cleared for entry into a CuFA database (one implementation, the Artifact Genome Project, abbreviated as AGP, is under development and briefly outlined). An ontological model encapsulates these required fields while utilizing a lower-level taxonomic schema. We use the Cyber Observable eXpression (CybOX) project due to its rising popularity and rigorous classifications of forensic objects. Additionally, we suggest some improvements on its integration into our model and identify higher-level location categories to illustrate tracing an object from creation through investigative leads. Finally, a step-wise procedure for researching and logging CuFAs is devised to accompany the model.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close The term ``artifact'' currently does not have a formal definition within the domain of cyber/digital forensics, resulting in a lack of standardized reporting, linguistic understanding between professionals, and efficiency. In this paper we propose a new definition based on a survey we conducted, literature usage, prior definitions of the word itself, and similarities with archival science. This definition includes required fields that all artifacts must have and encompasses the notion of curation. Thus, we propose using a new term – curated forensic artifact (CuFA) – to address items which have been cleared for entry into a CuFA database (one implementation, the Artifact Genome Project, abbreviated as AGP, is under development and briefly outlined). An ontological model encapsulates these required fields while utilizing a lower-level taxonomic schema. We use the Cyber Observable eXpression (CybOX) project due to its rising popularity and rigorous classifications of forensic objects. Additionally, we suggest some improvements on its integration into our model and identify higher-level location categories to illustrate tracing an object from creation through investigative leads. Finally, a step-wise procedure for researching and logging CuFAs is devised to accompany the model. Close http://www.sciencedirect.com/science/article/pii/S1742287616300366 doi:10.1016/j.diin.2016.04.005 Close
60.	Conlan, Kevin; Baggili, Ibrahim; Breitinger, Frank Anti-forensics: Furthering digital forensic science through a new extended, granular taxonomy (Journal Article) In: Digital Investigation, vol. 18, pp. 66–75, 2016, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{CBB16, title = {Anti-forensics: Furthering digital forensic science through a new extended, granular taxonomy}, author = {Kevin Conlan and Ibrahim Baggili and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S1742287616300378}, doi = {10.1016/j.diin.2016.04.006}, issn = {1742-2876}, year = {2016}, date = {2016-08-07}, journal = {Digital Investigation}, volume = {18}, pages = {66–75}, abstract = {Anti-forensic tools, techniques and methods are becoming a formidable obstacle for the digital forensic community. Thus, new research initiatives and strategies must be formulated to address this growing problem. In this work we first collect and categorize 308 anti-digital forensic tools to survey the field. We then devise an extended anti-forensic taxonomy to the one proposed by Rogers (2006) in order to create a more comprehensive taxonomy and facilitate linguistic standardization. Our work also takes into consideration anti-forensic activity which utilizes tools that were not originally designed for anti-forensic purposes, but can still be used with malicious intent. This category was labeled as Possible indications of anti-forensic activity, as certain software, scenarios, and digital artifacts could indicate anti-forensic activity on a system. We also publicly share our data sets, which includes categorical data on 308 collected anti-forensic tools, as well as 2780 unique hash values related to the installation files of 191 publicly available anti-forensic tools. As part of our analysis, the collected hash set was ran against the National Institute of Standards and Technology's 2016 National Software Reference Library, and only 423 matches were found out of the 2780 hashes. Our findings indicate a need for future endeavors in creating and maintaining exhaustive anti-forensic hash data sets.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Anti-forensic tools, techniques and methods are becoming a formidable obstacle for the digital forensic community. Thus, new research initiatives and strategies must be formulated to address this growing problem. In this work we first collect and categorize 308 anti-digital forensic tools to survey the field. We then devise an extended anti-forensic taxonomy to the one proposed by Rogers (2006) in order to create a more comprehensive taxonomy and facilitate linguistic standardization. Our work also takes into consideration anti-forensic activity which utilizes tools that were not originally designed for anti-forensic purposes, but can still be used with malicious intent. This category was labeled as Possible indications of anti-forensic activity, as certain software, scenarios, and digital artifacts could indicate anti-forensic activity on a system. We also publicly share our data sets, which includes categorical data on 308 collected anti-forensic tools, as well as 2780 unique hash values related to the installation files of 191 publicly available anti-forensic tools. As part of our analysis, the collected hash set was ran against the National Institute of Standards and Technology's 2016 National Software Reference Library, and only 423 matches were found out of the 2780 hashes. Our findings indicate a need for future endeavors in creating and maintaining exhaustive anti-forensic hash data sets. Close http://www.sciencedirect.com/science/article/pii/S1742287616300378 doi:10.1016/j.diin.2016.04.006 Close
61.	Zhang, Xiaolu; Breitinger, Frank; Baggili, Ibrahim Rapid Android Parser for Investigating DEX files (RAPID) (Journal Article) In: Digital Investigation, vol. 17, pp. 28–39, 2016, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{ZBB16, title = {Rapid Android Parser for Investigating DEX files (RAPID)}, author = {Xiaolu Zhang and Frank Breitinger and Ibrahim Baggili}, url = {http://www.sciencedirect.com/science/article/pii/S1742287616300305}, doi = {10.1016/j.diin.2016.03.002}, issn = {1742-2876}, year = {2016}, date = {2016-03-25}, journal = {Digital Investigation}, volume = {17}, pages = {28–39}, abstract = {Abstract Android malware is a well-known challenging problem and many researchers/vendors/practitioners have tried to address this issue through application analysis techniques. In order to analyze Android applications, tools decompress APK files and extract relevant data from the Dalvik EXecutable (DEX) files. To acquire the data, investigators either use decompiled intermediate code generated by existing tools, e.g., Baksmali or Dex2jar or write their own parsers/dissemblers. Thus, they either need additional time because of decompiling the application into an intermediate representation and then parsing text files, or they reinvent the wheel by implementing their own parsers. In this article, we present Rapid Android Parser for Investigating DEX files (RAPID) which is an open source and easy-to-use JAVA library for parsing DEX files. RAPID comes with well-documented APIs which allow users to query data directly from the DEX binary files. Our experiments reveal that RAPID outperforms existing approaches in terms of runtime efficiency, provides better reliability (does not crash) and can support dynamic analysis by finding critical offsets. Notably, the processing time for our sample set of 22.35 GB was only 1.5 h with RAPID while the traditional approaches needed about 23 h (parsing and querying).}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract Android malware is a well-known challenging problem and many researchers/vendors/practitioners have tried to address this issue through application analysis techniques. In order to analyze Android applications, tools decompress APK files and extract relevant data from the Dalvik EXecutable (DEX) files. To acquire the data, investigators either use decompiled intermediate code generated by existing tools, e.g., Baksmali or Dex2jar or write their own parsers/dissemblers. Thus, they either need additional time because of decompiling the application into an intermediate representation and then parsing text files, or they reinvent the wheel by implementing their own parsers. In this article, we present Rapid Android Parser for Investigating DEX files (RAPID) which is an open source and easy-to-use JAVA library for parsing DEX files. RAPID comes with well-documented APIs which allow users to query data directly from the DEX binary files. Our experiments reveal that RAPID outperforms existing approaches in terms of runtime efficiency, provides better reliability (does not crash) and can support dynamic analysis by finding critical offsets. Notably, the processing time for our sample set of 22.35 GB was only 1.5 h with RAPID while the traditional approaches needed about 23 h (parsing and querying). Close http://www.sciencedirect.com/science/article/pii/S1742287616300305 doi:10.1016/j.diin.2016.03.002 Close
62.	Gupta, Vikas; Breitinger, Frank How Cuckoo Filter Can Improve Existing Approximate Matching Techniques (Proceedings Article) In: James, Joshua I.; Breitinger, Frank (Ed.): Digital Forensics and Cyber Crime, pp. 39-52, Springer International Publishing, 2015, ISBN: 978-3-319-25511-8, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{GB15, title = {How Cuckoo Filter Can Improve Existing Approximate Matching Techniques}, author = {Vikas Gupta and Frank Breitinger}, editor = {Joshua I. James and Frank Breitinger}, url = {http://dx.doi.org/10.1007/978-3-319-25512-5_4}, doi = {10.1007/978-3-319-25512-5_4}, isbn = {978-3-319-25511-8}, year = {2015}, date = {2015-12-25}, booktitle = {Digital Forensics and Cyber Crime}, volume = {157}, pages = {39-52}, publisher = {Springer International Publishing}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter. In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter. In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint. Close http://dx.doi.org/10.1007/978-3-319-25512-5_4 doi:10.1007/978-3-319-25512-5_4 Close
63.	Harichandran, Vikram S.; Breitinger, Frank; Baggili, Ibrahim; Marrington, Andrew A cyber forensics needs analysis survey: Revisiting the domain's needs a decade later (Journal Article) In: Computers & Security, vol. 57, pp. 1–13, 2015, ISSN: 0167-4048. (Abstract \| Links \| BibTeX) @article{HBB15, title = {A cyber forensics needs analysis survey: Revisiting the domain's needs a decade later}, author = {Vikram S. Harichandran and Frank Breitinger and Ibrahim Baggili and Andrew Marrington}, url = {http://www.sciencedirect.com/science/article/pii/S0167404815001595}, doi = {10.1016/j.cose.2015.10.007}, issn = {0167-4048}, year = {2015}, date = {2015-11-10}, journal = {Computers & Security}, volume = {57}, pages = {1–13}, abstract = {Abstract The number of successful cyber attacks continues to increase, threatening financial and personal security worldwide. Cyber/digital forensics is undergoing a paradigm shift in which evidence is frequently massive in size, demands live acquisition, and may be insufficient to convict a criminal residing in another legal jurisdiction. This paper presents the findings of the first broad needs analysis survey in cyber forensics in nearly a decade, aimed at obtaining an updated consensus of professional attitudes in order to optimize resource allocation and to prioritize problems and possible solutions more efficiently. Results from the 99 respondents gave compelling testimony that the following will be necessary in the future: (1) better education/training/certification (opportunities, standardization, and skill-sets); (2) support for cloud and mobile forensics; (3) backing for and improvement of open-source tools (4) research on encryption, malware, and trail obfuscation; (5) revised laws (specific, up-to-date, and which protect user privacy); (6) better communication, especially between/with law enforcement (including establishing new frameworks to mitigate problematic communication); (7) more personnel and funding.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract The number of successful cyber attacks continues to increase, threatening financial and personal security worldwide. Cyber/digital forensics is undergoing a paradigm shift in which evidence is frequently massive in size, demands live acquisition, and may be insufficient to convict a criminal residing in another legal jurisdiction. This paper presents the findings of the first broad needs analysis survey in cyber forensics in nearly a decade, aimed at obtaining an updated consensus of professional attitudes in order to optimize resource allocation and to prioritize problems and possible solutions more efficiently. Results from the 99 respondents gave compelling testimony that the following will be necessary in the future: (1) better education/training/certification (opportunities, standardization, and skill-sets); (2) support for cloud and mobile forensics; (3) backing for and improvement of open-source tools (4) research on encryption, malware, and trail obfuscation; (5) revised laws (specific, up-to-date, and which protect user privacy); (6) better communication, especially between/with law enforcement (including establishing new frameworks to mitigate problematic communication); (7) more personnel and funding. Close http://www.sciencedirect.com/science/article/pii/S0167404815001595 doi:10.1016/j.cose.2015.10.007 Close
64.	Karpisek, Filip; Baggili, Ibrahim; Breitinger, Frank WhatsApp network forensics: Decrypting and understanding the WhatsApp call signaling messages (Journal Article) In: Digital Investigation, vol. 15, pp. 110–118, 2015, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{KBB15, title = {WhatsApp network forensics: Decrypting and understanding the WhatsApp call signaling messages}, author = {Filip Karpisek and Ibrahim Baggili and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S1742287615000985}, doi = {10.1016/j.diin.2015.09.002}, issn = {1742-2876}, year = {2015}, date = {2015-10-10}, journal = {Digital Investigation}, volume = {15}, pages = {110–118}, abstract = {Abstract WhatsApp is a widely adopted mobile messaging application with over 800 million users. Recently, a calling feature was added to the application and no comprehensive digital forensic analysis has been performed with regards to this feature at the time of writing this paper. In this work, we describe how we were able to decrypt the network traffic and obtain forensic artifacts that relate to this new calling feature which included the: a) WhatsApp phone numbers, b) WhatsApp server IPs, c) WhatsApp audio codec (Opus), d) WhatsApp call duration, and e) WhatsApp's call termination. We explain the methods and tools used to decrypt the traffic as well as thoroughly elaborate on our findings with respect to the WhatsApp signaling messages. Furthermore, we also provide the community with a tool that helps in the visualization of the WhatsApp protocol messages.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract WhatsApp is a widely adopted mobile messaging application with over 800 million users. Recently, a calling feature was added to the application and no comprehensive digital forensic analysis has been performed with regards to this feature at the time of writing this paper. In this work, we describe how we were able to decrypt the network traffic and obtain forensic artifacts that relate to this new calling feature which included the: a) WhatsApp phone numbers, b) WhatsApp server IPs, c) WhatsApp audio codec (Opus), d) WhatsApp call duration, and e) WhatsApp's call termination. We explain the methods and tools used to decrypt the traffic as well as thoroughly elaborate on our findings with respect to the WhatsApp signaling messages. Furthermore, we also provide the community with a tool that helps in the visualization of the WhatsApp protocol messages. Close http://www.sciencedirect.com/science/article/pii/S1742287615000985 doi:10.1016/j.diin.2015.09.002 Close
65.	James, Joshua I.; Breitinger, Frank (Ed.) Digital Forensics and Cyber Crime: 7th International Conference, ICDF2C 2015, Seoul, South Korea, October 6-8, 2015, Revised Selected Papers (Book) Springer, 2015, ISBN: 978-3-319-25511-8. (Links \| BibTeX) @book{JB15, title = {Digital Forensics and Cyber Crime: 7th International Conference, ICDF2C 2015, Seoul, South Korea, October 6-8, 2015, Revised Selected Papers}, editor = {Joshua I. James and Frank Breitinger}, url = {http://dx.doi.org/10.1007/978-3-319-25512-5}, doi = {10.1007/978-3-319-25512-5}, isbn = {978-3-319-25511-8}, year = {2015}, date = {2015-10-08}, volume = {157}, publisher = {Springer}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, keywords = {}, pubstate = {published}, tppubtype = {book} } Close http://dx.doi.org/10.1007/978-3-319-25512-5 doi:10.1007/978-3-319-25512-5 Close
66.	Baggili, Ibrahim; Oduru, Jeff; Anthony, Kyle; Breitinger, Frank; McGee, Glenn Watch What You Wear: Preliminary Forensic Analysis of Smart Watches (Proceedings Article) In: Availability, Reliability and Security (ARES), 2015 10th International Conference on, pp. 303-311, 2015. (Abstract \| Links \| BibTeX) @inproceedings{BOA15, title = {Watch What You Wear: Preliminary Forensic Analysis of Smart Watches}, author = {Ibrahim Baggili and Jeff Oduru and Kyle Anthony and Frank Breitinger and Glenn McGee}, doi = {10.1109/ARES.2015.39}, year = {2015}, date = {2015-08-27}, booktitle = {Availability, Reliability and Security (ARES), 2015 10th International Conference on}, pages = {303-311}, abstract = {This work presents preliminary forensic analysis of two popular smart watches, the Samsung Gear 2 Neo and LG G. These wearable computing devices have the form factor of watches and sync with smart phones to display notifications, track footsteps and record voice messages. We posit that as smart watches are adopted by more users, the potential for them becoming a haven for digital evidence will increase thus providing utility for this preliminary work. In our work, we examined the forensic artifacts that are left on a Samsung Galaxy S4 Active phone that was used to sync with the Samsung Gear 2 Neo watch and the LG G watch. We further outline a methodology for physically acquiring data from the watches after gaining root access to them. Our results show that we can recover a swath of digital evidence directly form the watches when compared to the data on the phone that is synced with the watches. Furthermore, to root the LG G watch, the watch has to be reset to its factory settings which is alarming because the process may delete data of forensic relevance. Although this method is forensically intrusive, it may be used for acquiring data from already rooted LG watches. It is our observation that the data at the core of the functionality of at least the two tested smart watches, messages, health and fitness data, e-mails, contacts, events and notifications are accessible directly from the acquired images of the watches, which affirms our claim that the forensic value of evidence from smart watches is worthy of further study and should be investigated both at a high level and with greater specificity and granularity.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This work presents preliminary forensic analysis of two popular smart watches, the Samsung Gear 2 Neo and LG G. These wearable computing devices have the form factor of watches and sync with smart phones to display notifications, track footsteps and record voice messages. We posit that as smart watches are adopted by more users, the potential for them becoming a haven for digital evidence will increase thus providing utility for this preliminary work. In our work, we examined the forensic artifacts that are left on a Samsung Galaxy S4 Active phone that was used to sync with the Samsung Gear 2 Neo watch and the LG G watch. We further outline a methodology for physically acquiring data from the watches after gaining root access to them. Our results show that we can recover a swath of digital evidence directly form the watches when compared to the data on the phone that is synced with the watches. Furthermore, to root the LG G watch, the watch has to be reset to its factory settings which is alarming because the process may delete data of forensic relevance. Although this method is forensically intrusive, it may be used for acquiring data from already rooted LG watches. It is our observation that the data at the core of the functionality of at least the two tested smart watches, messages, health and fitness data, e-mails, contacts, events and notifications are accessible directly from the acquired images of the watches, which affirms our claim that the forensic value of evidence from smart watches is worthy of further study and should be investigated both at a high level and with greater specificity and granularity. Close doi:10.1109/ARES.2015.39 Close
67.	Walnycky, Daniel; Baggili, Ibrahim; Marrington, Andrew; Moore, Jason; Breitinger, Frank Network and device forensic analysis of Android social-messaging applications (Journal Article) In: Digital Investigation, vol. 14, Supplement 1, pp. 77–84, 2015, ISSN: 1742-2876, (The Proceedings of the Fifteenth Annual DFRWS Conference). (Abstract \| Links \| BibTeX) @article{WBM15, title = {Network and device forensic analysis of Android social-messaging applications}, author = {Daniel Walnycky and Ibrahim Baggili and Andrew Marrington and Jason Moore and Frank Breitinger}, url = {http://www.sciencedirect.com/science/article/pii/S1742287615000547}, doi = {10.1016/j.diin.2015.05.009}, issn = {1742-2876}, year = {2015}, date = {2015-08-09}, journal = {Digital Investigation}, volume = {14, Supplement 1}, pages = {77–84}, abstract = {Abstract In this research we forensically acquire and analyze the device-stored data and network traffic of 20 popular instant messaging applications for Android. We were able to reconstruct some or the entire message content from 16 of the 20 applications tested, which reflects poorly on the security and privacy measures employed by these applications but may be construed positively for evidence collection purposes by digital forensic practitioners. This work shows which features of these instant messaging applications leave evidentiary traces allowing for suspect data to be reconstructed or partially reconstructed, and whether network forensics or device forensics permits the reconstruction of that activity. We show that in most cases we were able to reconstruct or intercept data such as: passwords, screenshots taken by applications, pictures, videos, audio sent, messages sent, sketches, profile pictures and more.}, note = {The Proceedings of the Fifteenth Annual DFRWS Conference}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract In this research we forensically acquire and analyze the device-stored data and network traffic of 20 popular instant messaging applications for Android. We were able to reconstruct some or the entire message content from 16 of the 20 applications tested, which reflects poorly on the security and privacy measures employed by these applications but may be construed positively for evidence collection purposes by digital forensic practitioners. This work shows which features of these instant messaging applications leave evidentiary traces allowing for suspect data to be reconstructed or partially reconstructed, and whether network forensics or device forensics permits the reconstruction of that activity. We show that in most cases we were able to reconstruct or intercept data such as: passwords, screenshots taken by applications, pictures, videos, audio sent, messages sent, sketches, profile pictures and more. Close http://www.sciencedirect.com/science/article/pii/S1742287615000547 doi:10.1016/j.diin.2015.05.009 Close
68.	Rathgeb, Christian; Breitinger, Frank; Baier, Harald; Busch, Christoph Towards Bloom filter-based indexing of iris biometric data (Proceedings Article) In: Biometrics (ICB), 2015 International Conference on, pp. 422–429, IEEE 2015, (bf Siew-Sngiem Best Poster Award). (Abstract \| Links \| BibTeX) @inproceedings{7139105, title = {Towards Bloom filter-based indexing of iris biometric data}, author = {Christian Rathgeb and Frank Breitinger and Harald Baier and Christoph Busch}, doi = {10.1109/ICB.2015.7139105}, year = {2015}, date = {2015-05-22}, booktitle = {Biometrics (ICB), 2015 International Conference on}, pages = {422–429}, organization = {IEEE}, abstract = {Conventional biometric identification systems require exhaustive 1 : N comparisons in order to identify a bio- metric probe, i.e. comparison time frequently dominates the overall computational workload. Biometric database indexing represents a challenging task since biometric data does not exhibit any natural sorting order. In this paper we present a preliminary study on the feasibility of applying Bloom filters for the purpose of iris biometric database indexing. It is shown that, by constructing a binary tree data structure of Bloom filters extracted from binary iris biometric templates (iris-codes), the search space can be reduced to O(log N ). In experiments, which are carried out on a medium-sized database of N = 256 subjects, biometric performance (accuracy) is maintained for different conventional identification systems. Further, perspectives on how to employ the proposed scheme on large-scale databases are given.}, note = {bf Siew-Sngiem Best Poster Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Conventional biometric identification systems require exhaustive 1 : N comparisons in order to identify a bio- metric probe, i.e. comparison time frequently dominates the overall computational workload. Biometric database indexing represents a challenging task since biometric data does not exhibit any natural sorting order. In this paper we present a preliminary study on the feasibility of applying Bloom filters for the purpose of iris biometric database indexing. It is shown that, by constructing a binary tree data structure of Bloom filters extracted from binary iris biometric templates (iris-codes), the search space can be reduced to O(log N ). In experiments, which are carried out on a medium-sized database of N = 256 subjects, biometric performance (accuracy) is maintained for different conventional identification systems. Further, perspectives on how to employ the proposed scheme on large-scale databases are given. Close doi:10.1109/ICB.2015.7139105 Close
69.	Satyendra, Gurjar; Baggili, Ibrahim; Breitinger, Frank; Fischer, Alice An empirical comparison of widely adopted hash functions in digital forensics: does the programming language and operating system make a difference? (Proceedings Article) In: Proceedings of the Conference on Digital Forensics, Security and Law, pp. 57–68, 2015. (Abstract \| Links \| BibTeX) @inproceedings{SBBF15, title = {An empirical comparison of widely adopted hash functions in digital forensics: does the programming language and operating system make a difference?}, author = {Gurjar Satyendra and Ibrahim Baggili and Frank Breitinger and Alice Fischer}, url = {https://commons.erau.edu/adfsl/2015/tuesday/6/}, year = {2015}, date = {2015-05-19}, booktitle = {Proceedings of the Conference on Digital Forensics, Security and Law}, pages = {57–68}, abstract = {Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that need to be processed in cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing functionality. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that need to be processed in cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing functionality. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512. Close https://commons.erau.edu/adfsl/2015/tuesday/6/ Close
70.	Baggili, Ibrahim; Breitinger, Frank Data Sources for Advancing Cyber Forensics: What the Social World Has to Offer (Proceedings Article) In: AAAI Spring Symposium Series, 2015. (Abstract \| Links \| BibTeX) @inproceedings{BB15, title = {Data Sources for Advancing Cyber Forensics: What the Social World Has to Offer}, author = {Ibrahim Baggili and Frank Breitinger}, url = {http://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10227}, year = {2015}, date = {2015-03-12}, booktitle = {AAAI Spring Symposium Series}, abstract = {Cyber forensics is fairly new as a scientific discipline and deals with the acquisition, authentication and analysis of digital evidence. One of the biggest challenges in this domain has thus far been real data sources that are available for experimentation. Only a few data sources exist at the time writing of this paper. The authors in this paper deliberate how social media data sources may impact future directions in cyber forensics, and describe how these data sources may be used as new digital forensic artifacts in future investigations. The authors also deliberate how the scientific community may leverage publically accessible social media data to advance the state of the art in Cyber Forensics.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cyber forensics is fairly new as a scientific discipline and deals with the acquisition, authentication and analysis of digital evidence. One of the biggest challenges in this domain has thus far been real data sources that are available for experimentation. Only a few data sources exist at the time writing of this paper. The authors in this paper deliberate how social media data sources may impact future directions in cyber forensics, and describe how these data sources may be used as new digital forensic artifacts in future investigations. The authors also deliberate how the scientific community may leverage publically accessible social media data to advance the state of the art in Cyber Forensics. Close http://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10227 Close
71.	Breitinger, Frank; Liu, Huajian; Winter, Christian; Baier, Harald; Rybalchenko, Alexey; Steinebach, Martin Towards a Process Model for Hash Functions in Digital Forensics (Proceedings Article) In: Gladyshev, Pavel; Marrington, Andrew; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 170-186, Springer International Publishing, 2014, ISBN: 978-3-319-14288-3. (Abstract \| Links \| BibTeX) @inproceedings{BLW14, title = {Towards a Process Model for Hash Functions in Digital Forensics}, author = {Frank Breitinger and Huajian Liu and Christian Winter and Harald Baier and Alexey Rybalchenko and Martin Steinebach}, editor = {Pavel Gladyshev and Andrew Marrington and Ibrahim Baggili}, url = {http://dx.doi.org/10.1007/978-3-319-14289-0_12}, doi = {10.1007/978-3-319-14289-0_12}, isbn = {978-3-319-14288-3}, year = {2014}, date = {2014-12-23}, booktitle = {Digital Forensics and Cyber Crime}, volume = {132}, pages = {170-186}, publisher = {Springer International Publishing}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting). One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device. First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting). One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device. First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model. Close http://dx.doi.org/10.1007/978-3-319-14289-0_12 doi:10.1007/978-3-319-14289-0_12 Close
72.	Rathgeb, Christian; Breitinger, Frank; Busch, Christoph; Baier, Harald On application of bloom filters to iris biometrics (Journal Article) In: IET Biometrics, vol. 3, no. 4, pp. 207-218, 2014, ISSN: 2047-4938. (Abstract \| Links \| BibTeX) @article{RBBB14, title = {On application of bloom filters to iris biometrics}, author = {Christian Rathgeb and Frank Breitinger and Christoph Busch and Harald Baier}, doi = {10.1049/iet-bmt.2013.0049}, issn = {2047-4938}, year = {2014}, date = {2014-12-18}, journal = {IET Biometrics}, volume = {3}, number = {4}, pages = {207-218}, abstract = {In this study, the application of adaptive Bloom filters to binary iris biometric feature vectors, that is, iris-codes, is proposed. Bloom filters, which have been established as a powerful tool in various fields of computer science, are applied in order to transform iris-codes to a rotation-invariant feature representation. Properties of the proposed Bloom filter-based transform concurrently enable (i) biometric template protection, (ii) compression of biometric data and (iii) acceleration of biometric identification, whereas at the same time no significant degradation of biometric performance is observed. According to these fields of application, detailed investigations are presented. Experiments are conducted on the CASIA-v3 iris database for different feature extraction algorithms. Confirming the soundness of the proposed approach, the application of adaptive Bloom filters achieves rotation-invariant cancelable templates maintaining biometric performance, a compression of templates down to 20-40% of original size and a reduction of bit-comparisons to less than 5% leading to a substantial speed-up of the biometric system in identification mode.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close In this study, the application of adaptive Bloom filters to binary iris biometric feature vectors, that is, iris-codes, is proposed. Bloom filters, which have been established as a powerful tool in various fields of computer science, are applied in order to transform iris-codes to a rotation-invariant feature representation. Properties of the proposed Bloom filter-based transform concurrently enable (i) biometric template protection, (ii) compression of biometric data and (iii) acceleration of biometric identification, whereas at the same time no significant degradation of biometric performance is observed. According to these fields of application, detailed investigations are presented. Experiments are conducted on the CASIA-v3 iris database for different feature extraction algorithms. Confirming the soundness of the proposed approach, the application of adaptive Bloom filters achieves rotation-invariant cancelable templates maintaining biometric performance, a compression of templates down to 20-40% of original size and a reduction of bit-comparisons to less than 5% leading to a substantial speed-up of the biometric system in identification mode. Close doi:10.1049/iet-bmt.2013.0049 Close
73.	Breitinger, Frank; Rathgeb, Christian; Baier, Harald An Efficient Similarity Digests Database Lookup - A Logarithmic Divide & Conquer Approach (Journal Article) In: Journal of Digital Forensics, Security and Law (JDFSL), vol. 9, no. 2, pp. 155–166, 2014. (Abstract \| Links \| BibTeX) @article{BRB14, title = {An Efficient Similarity Digests Database Lookup - A Logarithmic Divide & Conquer Approach}, author = {Frank Breitinger and Christian Rathgeb and Harald Baier}, url = {https://doi.org/10.15394/jdfsl.2014.1178}, doi = {10.15394/jdfsl.2014.1178}, year = {2014}, date = {2014-09-01}, journal = {Journal of Digital Forensics, Security and Law (JDFSL)}, volume = {9}, number = {2}, pages = {155–166}, abstract = {Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 220-fold acceleration of look-up costs.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 220-fold acceleration of look-up costs. Close https://doi.org/10.15394/jdfsl.2014.1178 doi:10.15394/jdfsl.2014.1178 Close
74.	Breitinger, Frank; Stivaktakis, Georgios; Baier, Harald FRASH: A Framework to Test Algorithms of Similarity Hashing (Journal Article) In: Digit. Investig., vol. 10, pp. S50–S58, 2014, ISSN: 1742-2876. (Abstract \| Links \| BibTeX) @article{BSB13, title = {FRASH: A Framework to Test Algorithms of Similarity Hashing}, author = {Frank Breitinger and Georgios Stivaktakis and Harald Baier}, url = {http://dx.doi.org/10.1016/j.diin.2013.06.006}, doi = {10.1016/j.diin.2013.06.006}, issn = {1742-2876}, year = {2014}, date = {2014-08-03}, journal = {Digit. Investig.}, volume = {10}, pages = {S50–S58}, publisher = {Elsevier Science Publishers B. V.}, address = {Amsterdam, The Netherlands, The Netherlands}, abstract = {Automated input identification is a very challenging, but also important task. Within computer forensics this reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is necessary to cope with similar inputs (e.g., different versions of a file), embedded objects (e.g., a JPG within a Word document), and fragments (e.g., network packets), too. Over the recent years a couple of different similarity hashing algorithms were published. However, due to the absence of a definition and a test framework, it is hardly possible to evaluate and compare these approaches to establish them in the community. The paper at hand aims at providing an assessment methodology and a sample implementation called FRASH: a framework to test algorithms of similarity hashing. First, we describe common use cases of a similarity hashing algorithm to motivate our two test classes efficiency and sensitivity & robustness. Next, our open and freely available framework is briefly described. Finally, we apply FRASH to the well-known similarity hashing approaches ssdeep and sdhash to show their strengths and weaknesses.}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Automated input identification is a very challenging, but also important task. Within computer forensics this reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is necessary to cope with similar inputs (e.g., different versions of a file), embedded objects (e.g., a JPG within a Word document), and fragments (e.g., network packets), too. Over the recent years a couple of different similarity hashing algorithms were published. However, due to the absence of a definition and a test framework, it is hardly possible to evaluate and compare these approaches to establish them in the community. The paper at hand aims at providing an assessment methodology and a sample implementation called FRASH: a framework to test algorithms of similarity hashing. First, we describe common use cases of a similarity hashing algorithm to motivate our two test classes efficiency and sensitivity & robustness. Next, our open and freely available framework is briefly described. Finally, we apply FRASH to the well-known similarity hashing approaches ssdeep and sdhash to show their strengths and weaknesses. Close http://dx.doi.org/10.1016/j.diin.2013.06.006 doi:10.1016/j.diin.2013.06.006 Close
75.	Breitinger, Frank On the utility of bytewise approximate matching in computer science with a special focus on digital forensics investigations (PhD Thesis) Technical University Darmstadt, 2014. (Links \| BibTeX) @phdthesis{FB-DISS, title = {On the utility of bytewise approximate matching in computer science with a special focus on digital forensics investigations}, author = {Frank Breitinger}, url = {http://tuprints.ulb.tu-darmstadt.de/4055/}, year = {2014}, date = {2014-06-30}, school = {Technical University Darmstadt}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } Close http://tuprints.ulb.tu-darmstadt.de/4055/ Close
76.	Breitinger, Frank; Stivaktakis, Georgios; Roussev, Vassil Evaluating Detection Error Trade-offs for Bytewise Approximate Matching Algorithms (Journal Article) In: Digital Investigation, vol. 11, no. 2, pp. 81–89, 2014, ISSN: 1742-2876, (bf Best Paper Award @ ICDF2C'13). (Abstract \| Links \| BibTeX) @article{BSR14, title = {Evaluating Detection Error Trade-offs for Bytewise Approximate Matching Algorithms}, author = {Frank Breitinger and Georgios Stivaktakis and Vassil Roussev}, url = {http://dx.doi.org/10.1016/j.diin.2014.05.002}, doi = {10.1016/j.diin.2014.05.002}, issn = {1742-2876}, year = {2014}, date = {2014-06-08}, journal = {Digital Investigation}, volume = {11}, number = {2}, pages = {81–89}, publisher = {Elsevier Science Publishers B. V.}, address = {Amsterdam, The Netherlands, The Netherlands}, abstract = {Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations. Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages, and have been evaluated in a somewhat ad-hoc manner. Recently, the FRASH testing framework has been proposed as a vehicle for developing a set of standardized tests for approximate matching algorithms; the aim is to provide a useful guide for understanding and comparing the absolute and relative performance of different algorithms. The contribution of this work is twofold: a) expand FRASH with automated tests for quantifying approximate matching algorithm behavior with respect to precision and recall; and b) present a case study of two algorithms already in use-sdhash and ssdeep.}, note = {bf Best Paper Award @ ICDF2C'13}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations. Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages, and have been evaluated in a somewhat ad-hoc manner. Recently, the FRASH testing framework has been proposed as a vehicle for developing a set of standardized tests for approximate matching algorithms; the aim is to provide a useful guide for understanding and comparing the absolute and relative performance of different algorithms. The contribution of this work is twofold: a) expand FRASH with automated tests for quantifying approximate matching algorithm behavior with respect to precision and recall; and b) present a case study of two algorithms already in use-sdhash and ssdeep. Close http://dx.doi.org/10.1016/j.diin.2014.05.002 doi:10.1016/j.diin.2014.05.002 Close
77.	Breitinger, Frank; Baggili, Ibrahim File Detection On Network Traffic Using Approximate Matching (Journal Article) In: Journal of Digital Forensics, Security and Law (JDFSL), vol. 9, no. 2, pp. 23–36, 2014, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @article{BB14, title = {File Detection On Network Traffic Using Approximate Matching}, author = {Frank Breitinger and Ibrahim Baggili}, url = {https://doi.org/10.15394/jdfsl.2014.1168}, doi = {10.15394/jdfsl.2014.1168}, year = {2014}, date = {2014-05-22}, journal = {Journal of Digital Forensics, Security and Law (JDFSL)}, volume = {9}, number = {2}, pages = {23–36}, abstract = {In recent years, Internet technologies changed enormously and allow faster Internet connections, higher data rates and mobile usage. Hence, it is possible to send huge amounts of data / files easily which is often used by insiders or attackers to steal intellectual property. As a consequence, data leakage prevention systems (DLPS) have been developed which analyze network traffic and alert in case of a data leak. Although the overall concepts of the detection techniques are known, the systems are mostly closed and commercial. Within this paper we present a new technique for network traffic analysis based on approximate matching (a.k.a fuzzy hashing) which is very common in digital forensics to correlate similar files. This paper demonstrates how to optimize and apply them on single network packets. Our contribution is a straightforward concept which does not need a comprehensive configuration: hash the file and store the digest in the database. Within our experiments we obtained false positive rates between 10−4 and 10−5 and an algorithm throughput of over 650 Mbit/s.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close In recent years, Internet technologies changed enormously and allow faster Internet connections, higher data rates and mobile usage. Hence, it is possible to send huge amounts of data / files easily which is often used by insiders or attackers to steal intellectual property. As a consequence, data leakage prevention systems (DLPS) have been developed which analyze network traffic and alert in case of a data leak. Although the overall concepts of the detection techniques are known, the systems are mostly closed and commercial. Within this paper we present a new technique for network traffic analysis based on approximate matching (a.k.a fuzzy hashing) which is very common in digital forensics to correlate similar files. This paper demonstrates how to optimize and apply them on single network packets. Our contribution is a straightforward concept which does not need a comprehensive configuration: hash the file and store the digest in the database. Within our experiments we obtained false positive rates between 10−4 and 10−5 and an algorithm throughput of over 650 Mbit/s. Close https://doi.org/10.15394/jdfsl.2014.1168 doi:10.15394/jdfsl.2014.1168 Close
78.	Breitinger, Frank; Guttman, Barbara; McCarrin, Michael; Roussev, Vassil; White, Douglas Approximate Matching: Definition and Terminology (Technical Report) National Institute of Standards and Technologies 2014. (Links \| BibTeX) @techreport{AM-DEF, title = {Approximate Matching: Definition and Terminology}, author = {Frank Breitinger and Barbara Guttman and Michael McCarrin and Vassil Roussev and Douglas White}, doi = {10.6028/NIST.SP.800-168}, year = {2014}, date = {2014-05-01}, institution = {National Institute of Standards and Technologies}, keywords = {}, pubstate = {published}, tppubtype = {techreport} } Close doi:10.6028/NIST.SP.800-168 Close
79.	Breitinger, Frank; Roussev, Vassil Automated evaluation of approximate matching algorithms on real data (Journal Article) In: Digital Investigation, vol. 11, Supplement 1, no. 0, pp. S10 - S17, 2014, ISSN: 1742-2876, (Proceedings of the First Annual DFRWS Europe). (Abstract \| Links \| BibTeX) @article{BR14, title = {Automated evaluation of approximate matching algorithms on real data}, author = {Frank Breitinger and Vassil Roussev}, url = {http://www.sciencedirect.com/science/article/pii/S1742287614000073}, doi = {10.1016/j.diin.2014.03.002}, issn = {1742-2876}, year = {2014}, date = {2014-03-13}, journal = {Digital Investigation}, volume = {11, Supplement 1}, number = {0}, pages = {S10 - S17}, abstract = {Abstract Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to screen and analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations. Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages and evaluation methodology is still evolving. Broadly, prior approaches have used either a human in the loop to manually evaluate the goodness of similarity matches on real world data, or controlled (pseudo-random) data to perform automated evaluation. This work's contribution is to introduce automated approximate matching evaluation on real data by relating approximate matching results to the longest common substring (LCS). Specifically, we introduce a computationally efficient LCS approximation and use it to obtain ground truth on the t5 set. Using the results, we evaluate three existing approximate matching schemes relative to LCS and analyze their performance.}, note = {Proceedings of the First Annual DFRWS Europe}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract Bytewise approximate matching is a relatively new area within digital forensics, but its importance is growing quickly as practitioners are looking for fast methods to screen and analyze the increasing amounts of data in forensic investigations. The essential idea is to complement the use of cryptographic hash functions to detect data objects with bytewise identical representation with the capability to find objects with bytewise similar representations. Unlike cryptographic hash functions, which have been studied and tested for a long time, approximate matching ones are still in their early development stages and evaluation methodology is still evolving. Broadly, prior approaches have used either a human in the loop to manually evaluate the goodness of similarity matches on real world data, or controlled (pseudo-random) data to perform automated evaluation. This work's contribution is to introduce automated approximate matching evaluation on real data by relating approximate matching results to the longest common substring (LCS). Specifically, we introduce a computationally efficient LCS approximation and use it to obtain ground truth on the t5 set. Using the results, we evaluate three existing approximate matching schemes relative to LCS and analyze their performance. Close http://www.sciencedirect.com/science/article/pii/S1742287614000073 doi:10.1016/j.diin.2014.03.002 Close
80.	Breitinger, Frank; Baier, Harald; White, Douglas On the database lookup problem of approximate matching (Journal Article) In: Digital Investigation, vol. 11, Supplement 1, no. 0, pp. S1–S9, 2014, ISSN: 1742-2876, (Proceedings of the First Annual DFRWS Europe). (Abstract \| Links \| BibTeX) @article{BBW14, title = {On the database lookup problem of approximate matching}, author = {Frank Breitinger and Harald Baier and Douglas White}, url = {http://www.sciencedirect.com/science/article/pii/S1742287614000061}, doi = {10.1016/j.diin.2014.03.001}, issn = {1742-2876}, year = {2014}, date = {2014-03-13}, journal = {Digital Investigation}, volume = {11, Supplement 1}, number = {0}, pages = {S1–S9}, abstract = {Abstract Investigating seized devices within digital forensics gets more and more difficult due to the increasing amount of data. Hence, a common procedure uses automated file identification which reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is also helpful to detect similar data by applying approximate matching. Let x denote the number of digests in a database, then the lookup for a single similarity digest has the complexity of O(x). In other words, the digest has to be compared against all digests in the database. In contrast, cryptographic hash values are stored within binary trees or hash tables and hence the lookup complexity of a single digest is O(log2(x)) or O(1), respectively. In this paper we present and evaluate a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(x) to O(1). Therefore, instead of using multiple small Bloom filters (which is the common procedure), we demonstrate that a single, huge Bloom filter has a far better performance. Our evaluation demonstrates that current approximate matching algorithms are too slow (e.g., over 21 min to compare 4457 digests of a common file corpus against each other) while the improved version solves this challenge within seconds. Studying the precision and recall rates shows that our approach works as reliably as the original implementations. We obtain this benefit by accuracy–the comparison is now a file-against-set comparison and thus it is not possible to see which file in the database is matched.}, note = {Proceedings of the First Annual DFRWS Europe}, keywords = {}, pubstate = {published}, tppubtype = {article} } Close Abstract Investigating seized devices within digital forensics gets more and more difficult due to the increasing amount of data. Hence, a common procedure uses automated file identification which reduces the amount of data an investigator has to look at by hand. Besides identifying exact duplicates, which is mostly solved using cryptographic hash functions, it is also helpful to detect similar data by applying approximate matching. Let x denote the number of digests in a database, then the lookup for a single similarity digest has the complexity of O(x). In other words, the digest has to be compared against all digests in the database. In contrast, cryptographic hash values are stored within binary trees or hash tables and hence the lookup complexity of a single digest is O(log2(x)) or O(1), respectively. In this paper we present and evaluate a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(x) to O(1). Therefore, instead of using multiple small Bloom filters (which is the common procedure), we demonstrate that a single, huge Bloom filter has a far better performance. Our evaluation demonstrates that current approximate matching algorithms are too slow (e.g., over 21 min to compare 4457 digests of a common file corpus against each other) while the improved version solves this challenge within seconds. Studying the precision and recall rates shows that our approach works as reliably as the original implementations. We obtain this benefit by accuracy–the comparison is now a file-against-set comparison and thus it is not possible to see which file in the database is matched. Close http://www.sciencedirect.com/science/article/pii/S1742287614000061 doi:10.1016/j.diin.2014.03.001 Close
81.	Breitinger, Frank; Winter, Christian; Yannikos, York; Fink, Tobias; Seefried, Michael Using Approximate Matching to Reduce the Volume of Digital Data (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 149-163, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6. (Abstract \| Links \| BibTeX) @inproceedings{BWY14, title = {Using Approximate Matching to Reduce the Volume of Digital Data}, author = {Frank Breitinger and Christian Winter and York Yannikos and Tobias Fink and Michael Seefried}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-662-44952-3_11}, doi = {10.1007/978-3-662-44952-3_11}, isbn = {978-3-662-44951-6}, year = {2014}, date = {2014-01-01}, booktitle = {Advances in Digital Forensics X}, volume = {433}, pages = {149-163}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%. Close http://dx.doi.org/10.1007/978-3-662-44952-3_11 doi:10.1007/978-3-662-44952-3_11 Close
82.	Breitinger, Frank; Ziroff, Georg; Lange, Steffen; Baier, Harald Similarity Hashing Based on Levenshtein Distances (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 133-147, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6. (Abstract \| Links \| BibTeX) @inproceedings{BZLB14, title = {Similarity Hashing Based on Levenshtein Distances}, author = {Frank Breitinger and Georg Ziroff and Steffen Lange and Harald Baier}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-662-44952-3_10}, doi = {10.1007/978-3-662-44952-3_10}, isbn = {978-3-662-44951-6}, year = {2014}, date = {2014-01-01}, booktitle = {Advances in Digital Forensics X}, volume = {433}, pages = {133-147}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file. This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as sdhash.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file. This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as sdhash. Close http://dx.doi.org/10.1007/978-3-662-44952-3_10 doi:10.1007/978-3-662-44952-3_10 Close
83.	Breitinger, Frank; Baier, Harald Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2 (Proceedings Article) In: Rogers, Marcus; Seigfried-Spellar, KathrynC. (Ed.): Digital Forensics and Cyber Crime, pp. 167-182, Springer Berlin Heidelberg, 2013, ISBN: 978-3-642-39890-2. (Abstract \| Links \| BibTeX) @inproceedings{BB12d, title = {Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2}, author = {Frank Breitinger and Harald Baier}, editor = {Marcus Rogers and KathrynC. Seigfried-Spellar}, url = {http://dx.doi.org/10.1007/978-3-642-39891-9_11}, doi = {10.1007/978-3-642-39891-9_11}, isbn = {978-3-642-39890-2}, year = {2013}, date = {2013-11-01}, booktitle = {Digital Forensics and Cyber Crime}, volume = {114}, pages = {167-182}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash func- tions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for similarity preserving hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional hash functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of similarity preserving hashing. Additionally, we extend the algorithm MRSH for similarity preserving hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash func- tions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for similarity preserving hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional hash functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of similarity preserving hashing. Additionally, we extend the algorithm MRSH for similarity preserving hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file. Close http://dx.doi.org/10.1007/978-3-642-39891-9_11 doi:10.1007/978-3-642-39891-9_11 Close
84.	Rathgeb, Christian; Breitinger, Frank; Busch, Christoph Alignment-free cancelable iris biometric templates based on adaptive bloom filters (Proceedings Article) In: Biometrics (ICB), 2013 International Conference on, pp. 1-8, 2013. (Abstract \| Links \| BibTeX) @inproceedings{RBB13, title = {Alignment-free cancelable iris biometric templates based on adaptive bloom filters}, author = {Christian Rathgeb and Frank Breitinger and Christoph Busch}, url = {http://dx.doi.org/10.1109/ICB.2013.6612976}, doi = {10.1109/ICB.2013.6612976}, year = {2013}, date = {2013-09-30}, booktitle = {Biometrics (ICB), 2013 International Conference on}, pages = {1-8}, abstract = {Biometric characteristics are largely immutable, i.e. unprotected storage of biometric data provokes serious privacy threats, e.g. identity theft, limited re-newability, or cross-matching. In accordance with the ISO/IEC 24745 standard, technologies of cancelable biometrics offer solutions to biometric information protection by obscuring biometric signal in a non-invertible manner, while biometric comparisons are still feasible in the transformed domain. In the presented work alignment-free cancelable iris biometrics based on adaptive Bloom filters are proposed. Bloom filter-based representations of binary biometric templates (iris-codes) enable an efficient alignment-invariant biometric comparison while a successive mapping of parts of a binary biometric template to a Bloom filter represents an irreversible transform. In experiments, which are carried out on the CASIA - v 3 iris database, it is demonstrated that the proposed system maintains biometric performance for diverse iris recognition algorithms, protecting biometric templates at high security levels.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Biometric characteristics are largely immutable, i.e. unprotected storage of biometric data provokes serious privacy threats, e.g. identity theft, limited re-newability, or cross-matching. In accordance with the ISO/IEC 24745 standard, technologies of cancelable biometrics offer solutions to biometric information protection by obscuring biometric signal in a non-invertible manner, while biometric comparisons are still feasible in the transformed domain. In the presented work alignment-free cancelable iris biometrics based on adaptive Bloom filters are proposed. Bloom filter-based representations of binary biometric templates (iris-codes) enable an efficient alignment-invariant biometric comparison while a successive mapping of parts of a binary biometric template to a Bloom filter represents an irreversible transform. In experiments, which are carried out on the CASIA - v 3 iris database, it is demonstrated that the proposed system maintains biometric performance for diverse iris recognition algorithms, protecting biometric templates at high security levels. Close http://dx.doi.org/10.1109/ICB.2013.6612976 doi:10.1109/ICB.2013.6612976 Close
85.	Breitinger, Frank; Astebøl, Knut; Baier, Harald; Busch, Christoph mvHash-B - A New Approach for Similarity Preserving Hashing (Proceedings Article) In: IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on, pp. 33-44, 2013. (Abstract \| Links \| BibTeX) @inproceedings{BABB13, title = {mvHash-B - A New Approach for Similarity Preserving Hashing}, author = {Frank Breitinger and Knut Astebøl and Harald Baier and Christoph Busch}, url = {http://dx.doi.org/10.1109/IMF.2013.18}, doi = {10.1109/IMF.2013.18}, year = {2013}, date = {2013-07-25}, booktitle = {IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on}, pages = {33-44}, abstract = {The handling of hundreds of thousands of files is a major challenge in today's IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The handling of hundreds of thousands of files is a major challenge in today's IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes. Close http://dx.doi.org/10.1109/IMF.2013.18 doi:10.1109/IMF.2013.18 Close
86.	Breitinger, Frank; Petrov, Kaloyan Reducing the Time Required for Hashing Operations (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics IX, pp. 101-117, Springer Berlin Heidelberg, 2013, ISBN: 978-3-642-41147-2. (Abstract \| Links \| BibTeX) @inproceedings{BK13, title = {Reducing the Time Required for Hashing Operations}, author = {Frank Breitinger and Kaloyan Petrov}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-642-41148-9_7}, doi = {10.1007/978-3-642-41148-9_7}, isbn = {978-3-642-41147-2}, year = {2013}, date = {2013-01-01}, booktitle = {Advances in Digital Forensics IX}, volume = {410}, pages = {101-117}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading. Close http://dx.doi.org/10.1007/978-3-642-41148-9_7 doi:10.1007/978-3-642-41148-9_7 Close
87.	Breitinger, Frank; Baier, Harald Performance Issues About Context-Triggered Piecewise Hashing (Proceedings Article) In: Gladyshev, Pavel; Rogers, MarcusK. (Ed.): Digital Forensics and Cyber Crime, pp. 141-155, Springer Berlin Heidelberg, 2012, ISBN: 978-3-642-35514-1. (Abstract \| Links \| BibTeX) @inproceedings{BB12a, title = {Performance Issues About Context-Triggered Piecewise Hashing}, author = {Frank Breitinger and Harald Baier}, editor = {Pavel Gladyshev and MarcusK. Rogers}, url = {http://dx.doi.org/10.1007/978-3-642-35515-8_12}, doi = {10.1007/978-3-642-35515-8_12}, isbn = {978-3-642-35514-1}, year = {2012}, date = {2012-12-01}, booktitle = {Digital Forensics and Cyber Crime}, volume = {88}, pages = {141-155}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach. Close http://dx.doi.org/10.1007/978-3-642-35515-8_12 doi:10.1007/978-3-642-35515-8_12 Close
88.	Breitinger, Frank; Baier, Harald Properties of a similarity preserving hash function and their realization in sdhash (Proceedings Article) In: Information Security for South Africa (ISSA), pp. 1-8, 2012. (Abstract \| Links \| BibTeX) @inproceedings{BB12c, title = {Properties of a similarity preserving hash function and their realization in sdhash}, author = {Frank Breitinger and Harald Baier}, url = {http://dx.doi.org/10.1109/ISSA.2012.6320445}, doi = {10.1109/ISSA.2012.6320445}, year = {2012}, date = {2012-10-04}, booktitle = {Information Security for South Africa (ISSA)}, pages = {1-8}, abstract = {Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved. Close http://dx.doi.org/10.1109/ISSA.2012.6320445 doi:10.1109/ISSA.2012.6320445 Close
89.	Breitinger, Frank; Baier, Harald; Beckingham, Jesse Security and implementation analysis of the similarity digest sdhash (Proceedings Article) In: First International Baltic Conference on Network Security & Forensics (NeSeFo), 2012. (BibTeX) @inproceedings{BBB12, title = {Security and implementation analysis of the similarity digest sdhash}, author = {Frank Breitinger and Harald Baier and Jesse Beckingham}, year = {2012}, date = {2012-08-01}, booktitle = {First International Baltic Conference on Network Security & Forensics (NeSeFo)}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close
90.	Breitinger, Frank; Baier, Harald A fuzzy hashing approach based on random sequences and hamming distance (Proceedings Article) In: Proceedings of the Conference on Digital Forensics, Security and Law, pp. 89–100, 2012. (Abstract \| Links \| BibTeX) @inproceedings{BB12b, title = {A fuzzy hashing approach based on random sequences and hamming distance}, author = {Frank Breitinger and Harald Baier}, url = {https://commons.erau.edu/cgi/viewcontent.cgi?article=1193&context=adfsl}, year = {2012}, date = {2012-05-01}, booktitle = {Proceedings of the Conference on Digital Forensics, Security and Law}, pages = {89–100}, abstract = {Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to `rebuild' an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file's bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to `rebuild' an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file's bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures. Close https://commons.erau.edu/cgi/viewcontent.cgi?article=1193&context=adfsl Close
91.	Baier, Harald; Breitinger, Frank Security Aspects of Piecewise Hashing in Computer Forensics (Proceedings Article) In: IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on, pp. 21-36, 2011. (Abstract \| Links \| BibTeX) @inproceedings{BB11, title = {Security Aspects of Piecewise Hashing in Computer Forensics}, author = {Harald Baier and Frank Breitinger}, url = {http://dx.doi.org/10.1109/IMF.2011.16}, doi = {10.1109/IMF.2011.16}, year = {2011}, date = {2011-06-17}, booktitle = {IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on}, pages = {21-36}, abstract = {Although hash functions are a well-known method in computer science to map arbitrary large data to bit strings of a fixed length, their use in computer forensics is currently very limited. As of today, in a pre-step process hash values of files are generated and stored in a database, typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. This approach has several drawbacks, which have been sketched in the community, and some alternative approaches have been proposed. The most popular one is due to Jesse Kornblum, who transferred ideas from spam detection to computer forensics in order to identify similar files. However, his proposal lacks a thorough security analysis. It is therefore one aim of the paper at hand to present some possible attack vectors of an active adversary to bypass Kornblum's approach. Furthermore, we present a pseudo random number generator being both more efficient and more random compared to Kornblum's pseudo random number generator.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Although hash functions are a well-known method in computer science to map arbitrary large data to bit strings of a fixed length, their use in computer forensics is currently very limited. As of today, in a pre-step process hash values of files are generated and stored in a database, typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. This approach has several drawbacks, which have been sketched in the community, and some alternative approaches have been proposed. The most popular one is due to Jesse Kornblum, who transferred ideas from spam detection to computer forensics in order to identify similar files. However, his proposal lacks a thorough security analysis. It is therefore one aim of the paper at hand to present some possible attack vectors of an active adversary to bypass Kornblum's approach. Furthermore, we present a pseudo random number generator being both more efficient and more random compared to Kornblum's pseudo random number generator. Close http://dx.doi.org/10.1109/IMF.2011.16 doi:10.1109/IMF.2011.16 Close
92.	Breitinger, Frank Security Aspects of fuzzy hashing (Masters Thesis) University of Applied Sciences Darmstadt, 2011. (BibTeX) @mastersthesis{FB-master, title = {Security Aspects of fuzzy hashing}, author = {Frank Breitinger}, year = {2011}, date = {2011-03-01}, school = {University of Applied Sciences Darmstadt}, keywords = {}, pubstate = {published}, tppubtype = {mastersthesis} } Close
93.	Breitinger, Frank; Nickel, Claudia User Survey on Phone Security and Usage (Proceedings Article) In: Brömme, Arslan; Busch, Christoph (Ed.): BIOSIG, pp. 139-144, GI, 2010, ISBN: 978-3-88579-258-1. (Abstract \| Links \| BibTeX) @inproceedings{BN10, title = {User Survey on Phone Security and Usage}, author = {Frank Breitinger and Claudia Nickel}, editor = {Arslan Brömme and Christoph Busch}, url = {http://dblp.uni-trier.de/db/conf/biosig/biosig2010.html#BreitingerN10}, isbn = {978-3-88579-258-1}, year = {2010}, date = {2010-06-01}, booktitle = {BIOSIG}, volume = {164}, pages = {139-144}, publisher = {GI}, series = {LNI}, abstract = {Mobile phones are widely used nowadays and during the last years devel- oped from simple phones to small computers with an increasing number of features. These result in a wide variety of data stored on the devices which could be a high security risk in case of unauthorized access. A comprehensive user survey was con- ducted to get information about what data is really stored on the mobile devices, how it is currently protected and if biometric authentication methods could improve the cur- rent state. This paper states the results from about 550 users of mobile devices. The analysis revealed a very low securtiy level of the devices. This is partly due to a low security awareness of their owners and partly due to the low acceptance of the offered authentication method based on PIN. Further results like the experiences with mobile thefts and the willingness to use biometric authentication methods as alternative to PIN authentication are also stated.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Mobile phones are widely used nowadays and during the last years devel- oped from simple phones to small computers with an increasing number of features. These result in a wide variety of data stored on the devices which could be a high security risk in case of unauthorized access. A comprehensive user survey was con- ducted to get information about what data is really stored on the mobile devices, how it is currently protected and if biometric authentication methods could improve the cur- rent state. This paper states the results from about 550 users of mobile devices. The analysis revealed a very low securtiy level of the devices. This is partly due to a low security awareness of their owners and partly due to the low acceptance of the offered authentication method based on PIN. Further results like the experiences with mobile thefts and the willingness to use biometric authentication methods as alternative to PIN authentication are also stated. Close http://dblp.uni-trier.de/db/conf/biosig/biosig2010.html#BreitingerN10 Close

93 entries « ‹ 2 of 2 › »

Frank Breitinger

Chair for Cybersecurity, University of Augsburg (Germany)