Refereed Proceedings – Frank Breitinger

2025
	Dorai, Gokila; Rad, Pouria; Breitinger, Frank; Bardhan, Rajon; Ramasamy, Vijayalakshmi Mapping the Research Landscape - An Exploratory Analysis of AI Applications in Digital Forensics (Proceedings Article) In: Coppens, Bart; Volckaert, Bruno; Naessens, Vincent; De Sutter, Bjorn (Ed.): Availability, Reliability and Security, pp. 113–130, Springer Nature Switzerland, Cham, 2025, ISBN: 978-3-032-00635-6. (Abstract \| Links \| BibTeX) @inproceedings{10.1007/978-3-032-00635-6_7, title = {Mapping the Research Landscape - An Exploratory Analysis of AI Applications in Digital Forensics}, author = {Gokila Dorai and Pouria Rad and Frank Breitinger and Rajon Bardhan and Vijayalakshmi Ramasamy}, editor = {Coppens, Bart and Volckaert, Bruno and Naessens, Vincent and De Sutter, Bjorn}, url = {https://doi.org/10.1007/978-3-032-00635-6_7}, doi = {10.1007/978-3-032-00635-6_7}, isbn = {978-3-032-00635-6}, year = {2025}, date = {2025-08-09}, urldate = {2025-08-09}, booktitle = {Availability, Reliability and Security}, pages = {113–130}, publisher = {Springer Nature Switzerland}, address = {Cham}, abstract = {Artificial intelligence (AI) and machine learning (ML) have great potential to enhance digital forensic investigation, but progress is impeded by challenges in building datasets that meet technical accuracy and legal requirements. We herein compile findings from the latest scholarly literature to identify potential key aspects that are required for building forensic datasets that can effectively support AI-based investigative tools. We examine current practices in dataset building, ranging from representativeness of data, quality of annotation, chain-of-custody documentation, and metadata standardization, and consider their effects carefully on training robust AI models. Results point to key shortcomings that impede advanced AI implementations in digital forensics, which form a strong baseline for developing a standard workflow for building forensic datasets. This work, therefore, forms a stepping stone for future projects to enhance investigation capabilities through a better-structured and legally sound process of dataset building.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Artificial intelligence (AI) and machine learning (ML) have great potential to enhance digital forensic investigation, but progress is impeded by challenges in building datasets that meet technical accuracy and legal requirements. We herein compile findings from the latest scholarly literature to identify potential key aspects that are required for building forensic datasets that can effectively support AI-based investigative tools. We examine current practices in dataset building, ranging from representativeness of data, quality of annotation, chain-of-custody documentation, and metadata standardization, and consider their effects carefully on training robust AI models. Results point to key shortcomings that impede advanced AI implementations in digital forensics, which form a strong baseline for developing a standard workflow for building forensic datasets. This work, therefore, forms a stepping stone for future projects to enhance investigation capabilities through a better-structured and legally sound process of dataset building. Close https://doi.org/10.1007/978-3-032-00635-6_7 doi:10.1007/978-3-032-00635-6_7 Close
	Wickramasekara, Akila; Densmore, Alanna; Breitinger, Frank; Studiawan, Hudan; Scanlon, Mark AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation (Proceedings Article) In: Proceedings of the Digital Forensics Doctoral Symposium, Association for Computing Machinery, Brno, CZ, 2025, ISBN: 9798400710766. (Abstract \| Links \| BibTeX) @inproceedings{10.1145/3712716.3712718, title = {AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation}, author = {Akila Wickramasekara and Alanna Densmore and Frank Breitinger and Hudan Studiawan and Mark Scanlon}, url = {https://doi.org/10.1145/3712716.3712718}, doi = {10.1145/3712716.3712718}, isbn = {9798400710766}, year = {2025}, date = {2025-04-01}, urldate = {2025-01-01}, booktitle = {Proceedings of the Digital Forensics Doctoral Symposium}, publisher = {Association for Computing Machinery}, address = {Brno, CZ}, series = {DFDS '25}, abstract = {Generative AI (GenAI) and Large Language Models (LLMs) show great potential in various domains, including digital forensics. A notable use case of these technologies is automatic code generation, which can reasonably be expected to include digital forensic applications in the not-too-distant future. As with any digital forensic tool, these systems must undergo extensive testing and validation. However, manually evaluating outputs, including generated DF code, remains a challenge. AutoDFBench is an automated framework designed to address this by validating AI-generated code and tools against NIST’s Computer Forensics Tool Testing Program (CFTT) procedures and subsequently calculating an AutoDFBench benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. It benchmarks generative AI systems, such as LLMs and automated code generation agents, for DF applications. This benchmark can support iterative development or serve as a comparison metric between GenAI DF systems. As a proof of concept, NIST’s forensic string search tests were used, involving more than 24,200 tests with five top-performing code generation LLMs. These tests validated the output of 121 cases, considering two levels of user expertise, two programming languages, and ten iterations per case with varying prompts. The results also highlight the significant limitations of the DF-specific solutions generated by generic LLMs.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Generative AI (GenAI) and Large Language Models (LLMs) show great potential in various domains, including digital forensics. A notable use case of these technologies is automatic code generation, which can reasonably be expected to include digital forensic applications in the not-too-distant future. As with any digital forensic tool, these systems must undergo extensive testing and validation. However, manually evaluating outputs, including generated DF code, remains a challenge. AutoDFBench is an automated framework designed to address this by validating AI-generated code and tools against NIST’s Computer Forensics Tool Testing Program (CFTT) procedures and subsequently calculating an AutoDFBench benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. It benchmarks generative AI systems, such as LLMs and automated code generation agents, for DF applications. This benchmark can support iterative development or serve as a comparison metric between GenAI DF systems. As a proof of concept, NIST’s forensic string search tests were used, involving more than 24,200 tests with five top-performing code generation LLMs. These tests validated the output of 121 cases, considering two levels of user expertise, two programming languages, and ten iterations per case with varying prompts. The results also highlight the significant limitations of the DF-specific solutions generated by generic LLMs. Close https://doi.org/10.1145/3712716.3712718 doi:10.1145/3712716.3712718 Close
	Vanini, Céline; Gruber, Jan; Hargreaves, Christopher; Benenson, Zinaida; Freiling, Felix; Breitinger, Frank Understanding Strategies and Challenges of Timestamp Tampering for Improved Digital Forensic Event Reconstruction (Proceedings Article) In: Proceedings of the Digital Forensics Doctoral Symposium, Association for Computing Machinery, Brno, CZ, 2025, ISBN: 9798400710766. (Abstract \| Links \| BibTeX) @inproceedings{10.1145/3712716.3712727, title = {Understanding Strategies and Challenges of Timestamp Tampering for Improved Digital Forensic Event Reconstruction}, author = {Céline Vanini and Jan Gruber and Christopher Hargreaves and Zinaida Benenson and Felix Freiling and Frank Breitinger}, url = {https://doi.org/10.1145/3712716.3712727}, doi = {10.1145/3712716.3712727}, isbn = {9798400710766}, year = {2025}, date = {2025-04-01}, urldate = {2025-01-01}, booktitle = {Proceedings of the Digital Forensics Doctoral Symposium}, publisher = {Association for Computing Machinery}, address = {Brno, CZ}, series = {DFDS '25}, abstract = {Timestamps play a pivotal role in digital forensic event reconstruction, but due to their non-essential nature, tampering or manipulation of timestamps is possible by users in multiple ways, even on running systems. This has a significant effect on the reliability of the results from applying a timeline analysis as part of an investigation. We investigate the problem of users tampering with timestamps on a running (“live”) system. While prior work has shown that digital evidence tampering is hard, we focus on the question of why this is so. By performing a qualitative user study with advanced university students, we derive factors that influence the reliability of successful tampering, such as the individual knowledge about temporal traces, and technical restrictions to change them. These insights help to assess the reliability of timestamps from individual artifacts that are used for event reconstruction and subsequently reduce the risk of misinterpretations.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Timestamps play a pivotal role in digital forensic event reconstruction, but due to their non-essential nature, tampering or manipulation of timestamps is possible by users in multiple ways, even on running systems. This has a significant effect on the reliability of the results from applying a timeline analysis as part of an investigation. We investigate the problem of users tampering with timestamps on a running (“live”) system. While prior work has shown that digital evidence tampering is hard, we focus on the question of why this is so. By performing a qualitative user study with advanced university students, we derive factors that influence the reliability of successful tampering, such as the individual knowledge about temporal traces, and technical restrictions to change them. These insights help to assess the reliability of timestamps from individual artifacts that are used for event reconstruction and subsequently reduce the risk of misinterpretations. Close https://doi.org/10.1145/3712716.3712727 doi:10.1145/3712716.3712727 Close
	Michelet, Gaëtan; Breitinger, Frank Automation for digital forensics: Towards a classification model for the community (Proceedings Article) In: Proceedings of the Digital Forensics Doctoral Symposium, Association for Computing Machinery, Brno, CZ, 2025, ISBN: 9798400710766. (Abstract \| Links \| BibTeX) @inproceedings{10.1145/3712716.3712725, title = {Automation for digital forensics: Towards a classification model for the community}, author = {Gaëtan Michelet and Frank Breitinger}, url = {https://doi.org/10.1145/3712716.3712725}, doi = {10.1145/3712716.3712725}, isbn = {9798400710766}, year = {2025}, date = {2025-04-01}, booktitle = {Proceedings of the Digital Forensics Doctoral Symposium}, publisher = {Association for Computing Machinery}, address = {Brno, CZ}, series = {DFDS '25}, abstract = {The current state of automation in digital forensics remains insufficiently defined. While the complexity of automated tools and methods has evolved significantly (e.g., from basic parsers to the integration of advanced techniques), it remains challenging to pinpoint the field’s overall progress or compare methods. A first step towards a solution was the work ‘Automation for digital forensics: Towards a definition for the community’ which defines automation but cannot categorize various methods. This work aims to address this gap and presents a first classification model for automation for digital forensics. Therefore, we analyzed automation classification schemes from different disciplines (e.g., cars) and assessed various model possibilities as well as characteristics. We conclude that a 2-dimensional model with the axis ‘decision’ and ‘level of automation’ is most appropriate and provide an overview table with examples.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The current state of automation in digital forensics remains insufficiently defined. While the complexity of automated tools and methods has evolved significantly (e.g., from basic parsers to the integration of advanced techniques), it remains challenging to pinpoint the field’s overall progress or compare methods. A first step towards a solution was the work ‘Automation for digital forensics: Towards a definition for the community’ which defines automation but cannot categorize various methods. This work aims to address this gap and presents a first classification model for automation for digital forensics. Therefore, we analyzed automation classification schemes from different disciplines (e.g., cars) and assessed various model possibilities as well as characteristics. We conclude that a 2-dimensional model with the axis ‘decision’ and ‘level of automation’ is most appropriate and provide an overview table with examples. Close https://doi.org/10.1145/3712716.3712725 doi:10.1145/3712716.3712725 Close
2023
	Ottmann, Jenny; Cengiz, Üsame; Breitinger, Frank; Freiling, Felix As if Time Had Stopped – Checking Memory Dumps for Quasi-Instantaneous Consistency (Proceedings Article) In: Proceedings of the Digital Forensics Research Conference USA (DFRWS USA), 2023. (Abstract \| Links \| BibTeX) @inproceedings{OUBF2023, title = {As if Time Had Stopped – Checking Memory Dumps for Quasi-Instantaneous Consistency}, author = {Jenny Ottmann and Üsame Cengiz and Frank Breitinger and Felix Freiling}, url = {https://dfrws.org/presentation/as-if-time-had-stopped-checking-memory-dumps-for-quasi-instantaneous-consistency/}, doi = {10.48550/arXiv.2307.12060}, year = {2023}, date = {2023-07-10}, booktitle = {Proceedings of the Digital Forensics Research Conference USA (DFRWS USA)}, abstract = {Memory dumps that are acquired while the system is running often contain inconsistencies like page smearing which hamper the analysis. One possibility to avoid inconsistencies is to pause the system during the acquisition and take an instantaneous memory dump. While this is possible for virtual machines, most systems cannot be frozen and thus the ideal dump can only be quasi-instantaneous, i.e., consistent despite the system running. In this article, we introduce a method allowing us to measure quasi-instantaneous consistency and show both, theoretically, and practically, that our method is valid but that in reality, dumps can be but usually are not quasi-instantaneously consistent. For the assessment, we run a pivot program enabling the evaluation of quasi-instantaneous consistency for its heap and allowing us to pinpoint where exactly inconsistencies occurred.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Memory dumps that are acquired while the system is running often contain inconsistencies like page smearing which hamper the analysis. One possibility to avoid inconsistencies is to pause the system during the acquisition and take an instantaneous memory dump. While this is possible for virtual machines, most systems cannot be frozen and thus the ideal dump can only be quasi-instantaneous, i.e., consistent despite the system running. In this article, we introduce a method allowing us to measure quasi-instantaneous consistency and show both, theoretically, and practically, that our method is valid but that in reality, dumps can be but usually are not quasi-instantaneously consistent. For the assessment, we run a pivot program enabling the evaluation of quasi-instantaneous consistency for its heap and allowing us to pinpoint where exactly inconsistencies occurred. Close https://dfrws.org/presentation/as-if-time-had-stopped-checking-memory-dumps-for-[...] doi:10.48550/arXiv.2307.12060 Close
2022
	Coates, Peter; Breitinger, Frank Identifying document similarity using a fast estimation of the Levenshtein Distance based on compression and signatures (Proceedings Article) In: Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU), 2022. (Abstract \| Links \| BibTeX) @inproceedings{CB2022, title = {Identifying document similarity using a fast estimation of the Levenshtein Distance based on compression and signatures}, author = {Peter Coates and Frank Breitinger}, url = {https://www.researchgate.net/publication/359961968_Identifying_document_similarity_using_a_fast_estimation_of_the_Levenshtein_Distance_based_on_compression_and_signatures}, doi = {10.48550/arXiv.2307.11496}, year = {2022}, date = {2022-03-31}, booktitle = {Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU)}, abstract = {Identifying document similarity has many applications, e.g., source code analysis or plagiarism detection. However, identifying similarities is not trivial and can be time complex. For instance, the Levenshtein Distance is a common metric to define the similarity between two documents but has quadratic runtime which makes it impractical for large documents where large starts with a few hundred kilobytes. In this paper, we present a novel concept that allows estimating the Levenshtein Distance: the algorithm first compresses documents to signatures (similar to hash values) using a user-defined compression ratio. Signatures can then be compared against each other (some constrains apply) where the outcome is the estimated Levenshtein Distance. Our evaluation shows promising results in terms of runtime efficiency and accuracy. In addition, we introduce a significance score allowing examiners to set a threshold and identify related documents.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Identifying document similarity has many applications, e.g., source code analysis or plagiarism detection. However, identifying similarities is not trivial and can be time complex. For instance, the Levenshtein Distance is a common metric to define the similarity between two documents but has quadratic runtime which makes it impractical for large documents where large starts with a few hundred kilobytes. In this paper, we present a novel concept that allows estimating the Levenshtein Distance: the algorithm first compresses documents to signatures (similar to hash values) using a user-defined compression ratio. Signatures can then be compared against each other (some constrains apply) where the outcome is the estimated Levenshtein Distance. Our evaluation shows promising results in terms of runtime efficiency and accuracy. In addition, we introduce a significance score allowing examiners to set a threshold and identify related documents. Close https://www.researchgate.net/publication/359961968_Identifying_document_similari[...] doi:10.48550/arXiv.2307.11496 Close
	Ottmann, Jenny; Breitinger, Frank; Freiling, Felix Defining Atomicity (and Integrity) for Snapshots of Storage in Forensic Computing (Proceedings Article) In: Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU), 2022. (Abstract \| Links \| BibTeX) @inproceedings{OBF2022, title = {Defining Atomicity (and Integrity) for Snapshots of Storage in Forensic Computing}, author = {Jenny Ottmann and Frank Breitinger and Felix Freiling}, url = {https://www.researchgate.net/publication/359962048_Defining_Atomicity_and_Integrity_for_Snapshots_of_Storage_in_Forensic_Computing}, year = {2022}, date = {2022-03-31}, booktitle = {Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU)}, abstract = {The acquisition of data from main memory or from hard disk storage is usually one of the first steps in a forensic investigation. We revisit the discussion on quality criteria for ``forensically sound'' acquisition of such storage and propose a new way to capture the intent to acquire an instantaneous snapshot from a single target system. The idea of our definition is to allow a certain flexibility into when individual portions of memory are acquired, but at the same time require being consistent with causality (i.e., cause/effect relations). Our concept is much stronger than the original notion of atomicity defined by Vömel and Freiling (2012) but still attainable using copy-on-write mechanisms. As a minor result, we also fix a conceptual problem within the original definition of integrity.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The acquisition of data from main memory or from hard disk storage is usually one of the first steps in a forensic investigation. We revisit the discussion on quality criteria for ``forensically sound'' acquisition of such storage and propose a new way to capture the intent to acquire an instantaneous snapshot from a single target system. The idea of our definition is to allow a certain flexibility into when individual portions of memory are acquired, but at the same time require being consistent with causality (i.e., cause/effect relations). Our concept is much stronger than the original notion of atomicity defined by Vömel and Freiling (2012) but still attainable using copy-on-write mechanisms. As a minor result, we also fix a conceptual problem within the original definition of integrity. Close https://www.researchgate.net/publication/359962048_Defining_Atomicity_and_Integr[...] Close
2019
	Moia, Vitor Hugo Galhardo; Breitinger, Frank; Henriques, Marco Aurélio Amaral Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations (Proceedings Article) In: XIX Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SBC) SÃpounds o Paulo-SP, Brazil 2019, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{MBH19, title = {Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations}, author = {Vitor Hugo Galhardo Moia and Frank Breitinger and Marco Aurélio Amaral Henriques}, url = {https://sol.sbc.org.br/index.php/sbseg/article/download/13966/13815}, year = {2019}, date = {2019-09-05}, booktitle = {XIX Brazilian Symposium on information and computational systems security}, organization = {Brazilian Computer Society (SBC) SÃpounds o Paulo-SP, Brazil}, abstract = {Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different files regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be filtered out and that a new interpretation of the score allows a better similarity detection.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different files regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be filtered out and that a new interpretation of the score allows a better similarity detection. Close https://sol.sbc.org.br/index.php/sbseg/article/download/13966/13815 Close
	Wu, Tina; Breitinger, Frank; Baggili, Ibrahim IoT Ignorance is Digital Forensics Research Bliss: A Survey to Understand IoT Forensics Definitions, Challenges and Future Research Directions (Proceedings Article) In: Proceedings of the 14th International Conference on Availability, Reliability and Security, pp. 46:1–46:15, ACM, Canterbury, CA, United Kingdom, 2019, ISBN: 978-1-4503-7164-3. (Links \| BibTeX) @inproceedings{WBB19, title = {IoT Ignorance is Digital Forensics Research Bliss: A Survey to Understand IoT Forensics Definitions, Challenges and Future Research Directions}, author = {Tina Wu and Frank Breitinger and Ibrahim Baggili}, url = {http://doi.acm.org/10.1145/3339252.3340504}, doi = {10.1145/3339252.3340504}, isbn = {978-1-4503-7164-3}, year = {2019}, date = {2019-08-25}, booktitle = {Proceedings of the 14th International Conference on Availability, Reliability and Security}, pages = {46:1–46:15}, publisher = {ACM}, address = {Canterbury, CA, United Kingdom}, series = {ARES '19}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close http://doi.acm.org/10.1145/3339252.3340504 doi:10.1145/3339252.3340504 Close
	Przyborski, Kristen; Breitinger, Frank; Beck, Lauren; Harichandran, Ronald S. `Cyber World' as a Theme for a University-wide First-year Common Course (Proceedings Article) In: 2019 ASEE Annual Conference & Exposition, ASEE Conferences, Tampa, Florida, 2019, (urlhttps://peer.asee.org/31923). (Links \| BibTeX) @inproceedings{Przyborski2019, title = {`Cyber World' as a Theme for a University-wide First-year Common Course}, author = {Kristen Przyborski and Frank Breitinger and Lauren Beck and Ronald S. Harichandran}, doi = {10.18260/1-2–31923}, year = {2019}, date = {2019-06-01}, booktitle = {2019 ASEE Annual Conference & Exposition}, publisher = {ASEE Conferences}, address = {Tampa, Florida}, note = {urlhttps://peer.asee.org/31923}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close doi:10.18260/1-2–31923 Close
2018
	Haigh, Trevor; Breitinger, Frank; Baggili, Ibrahim If I Had a Million Cryptos: Cryptowallet Application Analysis and a Trojan Proof-of-Concept (Proceedings Article) In: Breitinger, Frank; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 45–65, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-05487-8, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{HBB19, title = {If I Had a Million Cryptos: Cryptowallet Application Analysis and a Trojan Proof-of-Concept}, author = {Trevor Haigh and Frank Breitinger and Ibrahim Baggili}, editor = {Frank Breitinger and Ibrahim Baggili}, doi = {10.1007/978-3-030-05487-8_3}, isbn = {978-3-030-05487-8}, year = {2018}, date = {2018-12-30}, booktitle = {Digital Forensics and Cyber Crime}, pages = {45–65}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {Cryptocurrencies have gained wide adoption by enthusiasts and investors. In this work, we examine seven different Android cryptowallet applications for forensic artifacts, but we also assess their security against tampering and reverse engineering. Some of the biggest benefits of cryptocurrency is its security and relative anonymity. For this reason it is vital that wallet applications share the same properties. Our work, however, indicates that this is not the case. Five of the seven applications we tested do not implement basic security measures against reverse engineering. Three of the applications stored sensitive information, like wallet private keys, insecurely and one was able to be decrypted with some effort. One of the applications did not require root access to retrieve the data. We were also able to implement a proof-of-concept trojan which exemplifies how a malicious actor may exploit the lack of security in these applications and exfiltrate user data and cryptocurrency.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cryptocurrencies have gained wide adoption by enthusiasts and investors. In this work, we examine seven different Android cryptowallet applications for forensic artifacts, but we also assess their security against tampering and reverse engineering. Some of the biggest benefits of cryptocurrency is its security and relative anonymity. For this reason it is vital that wallet applications share the same properties. Our work, however, indicates that this is not the case. Five of the seven applications we tested do not implement basic security measures against reverse engineering. Three of the applications stored sensitive information, like wallet private keys, insecurely and one was able to be decrypted with some effort. One of the applications did not require root access to retrieve the data. We were also able to implement a proof-of-concept trojan which exemplifies how a malicious actor may exploit the lack of security in these applications and exfiltrate user data and cryptocurrency. Close doi:10.1007/978-3-030-05487-8_3 Close
	Schmicker, Robert; Breitinger, Frank; Baggili, Ibrahim AndroParse - An Android Feature Extraction Framework and Dataset (Proceedings Article) In: Breitinger, Frank; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 66–88, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-05487-8. (Abstract \| Links \| BibTeX) @inproceedings{SBB19, title = {AndroParse - An Android Feature Extraction Framework and Dataset}, author = {Robert Schmicker and Frank Breitinger and Ibrahim Baggili}, editor = {Frank Breitinger and Ibrahim Baggili}, doi = {10.1007/978-3-030-05487-8_4}, isbn = {978-3-030-05487-8}, year = {2018}, date = {2018-12-30}, booktitle = {Digital Forensics and Cyber Crime}, pages = {66–88}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples. Close doi:10.1007/978-3-030-05487-8_4 Close
	Luciano, Laoise; Baggili, Ibrahim; Topor, Mateusz; Casey, Peter; Breitinger, Frank Digital Forensics in the Next Five Years (Proceedings Article) In: Proceedings of the 13th International Conference on Availability, Reliability and Security, pp. 46:1–46:14, ACM, Hamburg, Germany, 2018, ISBN: 978-1-4503-6448-5. (Abstract \| Links \| BibTeX) @inproceedings{LBT18, title = {Digital Forensics in the Next Five Years}, author = {Laoise Luciano and Ibrahim Baggili and Mateusz Topor and Peter Casey and Frank Breitinger}, url = {http://doi.acm.org/10.1145/3230833.3232813}, doi = {10.1145/3230833.3232813}, isbn = {978-1-4503-6448-5}, year = {2018}, date = {2018-08-30}, booktitle = {Proceedings of the 13th International Conference on Availability, Reliability and Security}, pages = {46:1–46:14}, publisher = {ACM}, address = {Hamburg, Germany}, series = {ARES 2018}, abstract = {Cyber forensics has encountered major obstacles over the last decade and is at a crossroads. This paper presents data that was obtained during the National Workshop on Redefining Cyber Forensics (NWRCF) on May 23-24, 2017 supported by the National Science Foundation and organized by the University of New Haven. Qualitative and quantitative data were analyzed from twenty-four cyber forensics expert panel members. This work identified important themes that need to be addressed by the community, focusing on (1) where the domain currently is; (2) where it needs to go and; (3) steps needed to improve it. Furthermore, based on the results, we articulate (1) the biggest anticipated challenges the domain will face in the next five years; (2) the most important cyber forensics research opportunities in the next five years and; (3) the most important job-ready skills that need to be addressed by higher education curricula over the next five years. Lastly, we present the key issues and recommendations deliberated by the expert panel. Overall results indicated that a more active and coherent group needs to be formed in the cyber forensics community, with opportunities for continuous reassessment and improvement processes in place.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cyber forensics has encountered major obstacles over the last decade and is at a crossroads. This paper presents data that was obtained during the National Workshop on Redefining Cyber Forensics (NWRCF) on May 23-24, 2017 supported by the National Science Foundation and organized by the University of New Haven. Qualitative and quantitative data were analyzed from twenty-four cyber forensics expert panel members. This work identified important themes that need to be addressed by the community, focusing on (1) where the domain currently is; (2) where it needs to go and; (3) steps needed to improve it. Furthermore, based on the results, we articulate (1) the biggest anticipated challenges the domain will face in the next five years; (2) the most important cyber forensics research opportunities in the next five years and; (3) the most important job-ready skills that need to be addressed by higher education curricula over the next five years. Lastly, we present the key issues and recommendations deliberated by the expert panel. Overall results indicated that a more active and coherent group needs to be formed in the cyber forensics community, with opportunities for continuous reassessment and improvement processes in place. Close http://doi.acm.org/10.1145/3230833.3232813 doi:10.1145/3230833.3232813 Close
	Liebler, Lorenz; Breitinger, Frank mrsh-mem: Approximate Matching on Raw Memory Dumps (Proceedings Article) In: 2018 11th International Conference on IT Security Incident Management IT Forensics (IMF), pp. 47-64, 2018. (Abstract \| Links \| BibTeX) @inproceedings{LB18, title = {mrsh-mem: Approximate Matching on Raw Memory Dumps}, author = {Lorenz Liebler and Frank Breitinger}, doi = {10.1109/IMF.2018.00011}, year = {2018}, date = {2018-05-09}, booktitle = {2018 11th International Conference on IT Security Incident Management IT Forensics (IMF)}, pages = {47-64}, abstract = {This paper presents the fusion of two subdomains of digital forensics: (1) raw memory analysis and (2) approximate matching. Specifically, this paper describes a prototype implementation named MRSH-MEM that allows to compare hard drive images as well as memory dumps and therefore can answer the question if a particular program (installed on a hard drive) is currently running / loaded in memory. To answer this question, we only require both dumps or access to a public repository which provides the binaries to be tested. For our prototype, we modified an existing approximate matching algorithm named MRSH-NET and combined it with approxis, an approximate disassembler. Recent literature claims that approximate matching techniques are slow and hardly applicable to the field of memory forensics. Especially legitimate changes to executables in memory caused by the loader itself prevent the application of current bytewise approximate matching techniques. Our approach lowers the impact of modified code in memory and shows a good computational performance. During our experiments, we show how an investigator can leverage meaningful insights by combining data gained from a hard disk image and raw memory dumps with a practicability runtime performance. Lastly, our current implementation will be integrable into the Volatility memory forensics framework and we introduce new possibilities for providing data driven cross validation functions. Our current proof of concept implementation supports Linux based raw memory dumps.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper presents the fusion of two subdomains of digital forensics: (1) raw memory analysis and (2) approximate matching. Specifically, this paper describes a prototype implementation named MRSH-MEM that allows to compare hard drive images as well as memory dumps and therefore can answer the question if a particular program (installed on a hard drive) is currently running / loaded in memory. To answer this question, we only require both dumps or access to a public repository which provides the binaries to be tested. For our prototype, we modified an existing approximate matching algorithm named MRSH-NET and combined it with approxis, an approximate disassembler. Recent literature claims that approximate matching techniques are slow and hardly applicable to the field of memory forensics. Especially legitimate changes to executables in memory caused by the loader itself prevent the application of current bytewise approximate matching techniques. Our approach lowers the impact of modified code in memory and shows a good computational performance. During our experiments, we show how an investigator can leverage meaningful insights by combining data gained from a hard disk image and raw memory dumps with a practicability runtime performance. Lastly, our current implementation will be integrable into the Volatility memory forensics framework and we introduce new possibilities for providing data driven cross validation functions. Our current proof of concept implementation supports Linux based raw memory dumps. Close doi:10.1109/IMF.2018.00011 Close
	Lillis, David; Breitinger, Frank; Scanlon, Mark Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees (Proceedings Article) In: Matoušek, Petr; Schmiedecker, Martin (Ed.): Digital Forensics and Cyber Crime, pp. 144–157, Springer International Publishing, Cham, 2018, ISBN: 978-3-319-73697-6, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{LBS18, title = {Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees}, author = {David Lillis and Frank Breitinger and Mark Scanlon}, editor = {Petr Matoušek and Martin Schmiedecker}, doi = {10.1007/978-3-319-73697-6_11}, isbn = {978-3-319-73697-6}, year = {2018}, date = {2018-01-06}, booktitle = {Digital Forensics and Cyber Crime}, pages = {144–157}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of ``known-illegal'' files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of ``known-illegal'' files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. Close doi:10.1007/978-3-319-73697-6_11 Close
	Knieriem, Brandon; Zhang, Xiaolu; Levine, Philip; Breitinger, Frank; Baggili, Ibrahim An Overview of the Usage of Default Passwords (Proceedings Article) In: Matoušek, Petr; Schmiedecker, Martin (Ed.): Digital Forensics and Cyber Crime, pp. 195–203, Springer International Publishing, Cham, 2018, ISBN: 978-3-319-73697-6. (Abstract \| Links \| BibTeX) @inproceedings{KZL18, title = {An Overview of the Usage of Default Passwords}, author = {Brandon Knieriem and Xiaolu Zhang and Philip Levine and Frank Breitinger and Ibrahim Baggili}, editor = {Petr Matoušek and Martin Schmiedecker}, doi = {10.1007/978-3-319-73697-6_15}, isbn = {978-3-319-73697-6}, year = {2018}, date = {2018-01-06}, booktitle = {Digital Forensics and Cyber Crime}, pages = {195–203}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {The recent Mirai botnet attack demonstrated the danger of using default passwords and showed it is still a major problem. In this study we investigated several common applications and their password policies. Specifically, we analyzed if these applications: (1) have default passwords or (2) allow the user to set a weak password (i.e., they do not properly enforce a password policy). Our study shows that default passwords are still a significant problem: 61% of applications inspected initially used a default or blank password. When changing the password, 58% allowed a blank password, 35% allowed a weak password of 1 character.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The recent Mirai botnet attack demonstrated the danger of using default passwords and showed it is still a major problem. In this study we investigated several common applications and their password policies. Specifically, we analyzed if these applications: (1) have default passwords or (2) allow the user to set a weak password (i.e., they do not properly enforce a password policy). Our study shows that default passwords are still a significant problem: 61% of applications inspected initially used a default or blank password. When changing the password, 58% allowed a blank password, 35% allowed a weak password of 1 character. Close doi:10.1007/978-3-319-73697-6_15 Close
2017
	Meffert, Christopher; Clark, Devon; Baggili, Ibrahim; Breitinger, Frank Forensic State Acquisition from Internet of Things (FSAIoT): A General Framework and Practical Approach for IoT Forensics Through IoT Device State Acquisition (Proceedings Article) In: Proceedings of the 12th International Conference on Availability, Reliability and Security, pp. 56:1–56:11, ACM, Reggio Calabria, Italy, 2017, ISBN: 978-1-4503-5257-4. (Abstract \| Links \| BibTeX) @inproceedings{MCBB17, title = {Forensic State Acquisition from Internet of Things (FSAIoT): A General Framework and Practical Approach for IoT Forensics Through IoT Device State Acquisition}, author = {Christopher Meffert and Devon Clark and Ibrahim Baggili and Frank Breitinger}, url = {http://doi.acm.org/10.1145/3098954.3104053}, doi = {10.1145/3098954.3104053}, isbn = {978-1-4503-5257-4}, year = {2017}, date = {2017-09-01}, booktitle = {Proceedings of the 12th International Conference on Availability, Reliability and Security}, pages = {56:1–56:11}, publisher = {ACM}, address = {Reggio Calabria, Italy}, series = {ARES '17}, abstract = {IoT device forensics is a difficult problem given that manufactured IoT devices are not standardized, many store little to no historical data, and are always connected; making them extremely volatile. The goal of this paper was to address these challenges by presenting a primary account for a general framework and practical approach we term Forensic State Acquisition from Internet of Things (FSAIoT). We argue that by leveraging the acquisition of the state of IoT devices (e.g. if an IoT lock is open or locked), it becomes possible to paint a clear picture of events that have occurred. To this end, FSAIoT consists of a centralized Forensic State Acquisition Controller (FSAC) employed in three state collection modes: controller to IoT device, controller to cloud, and controller to controller. We present a proof of concept implementation using openHAB – a device agnostic open source IoT device controller – and self-created scripts, to resemble a FSAC implementation. Our proof of concept employed an Insteon IP Camera as a controller to device test, an Insteon Hub as a controller to controller test, and a nest thermostat for a a controller to cloud test. Our findings show that it is possible to practically pull forensically relevant state data from IoT devices. Future work and open research problems are shared.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close IoT device forensics is a difficult problem given that manufactured IoT devices are not standardized, many store little to no historical data, and are always connected; making them extremely volatile. The goal of this paper was to address these challenges by presenting a primary account for a general framework and practical approach we term Forensic State Acquisition from Internet of Things (FSAIoT). We argue that by leveraging the acquisition of the state of IoT devices (e.g. if an IoT lock is open or locked), it becomes possible to paint a clear picture of events that have occurred. To this end, FSAIoT consists of a centralized Forensic State Acquisition Controller (FSAC) employed in three state collection modes: controller to IoT device, controller to cloud, and controller to controller. We present a proof of concept implementation using openHAB – a device agnostic open source IoT device controller – and self-created scripts, to resemble a FSAC implementation. Our proof of concept employed an Insteon IP Camera as a controller to device test, an Insteon Hub as a controller to controller test, and a nest thermostat for a a controller to cloud test. Our findings show that it is possible to practically pull forensically relevant state data from IoT devices. Future work and open research problems are shared. Close http://doi.acm.org/10.1145/3098954.3104053 doi:10.1145/3098954.3104053 Close
2015
	Gupta, Vikas; Breitinger, Frank How Cuckoo Filter Can Improve Existing Approximate Matching Techniques (Proceedings Article) In: James, Joshua I.; Breitinger, Frank (Ed.): Digital Forensics and Cyber Crime, pp. 39-52, Springer International Publishing, 2015, ISBN: 978-3-319-25511-8, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{GB15, title = {How Cuckoo Filter Can Improve Existing Approximate Matching Techniques}, author = {Vikas Gupta and Frank Breitinger}, editor = {Joshua I. James and Frank Breitinger}, url = {http://dx.doi.org/10.1007/978-3-319-25512-5_4}, doi = {10.1007/978-3-319-25512-5_4}, isbn = {978-3-319-25511-8}, year = {2015}, date = {2015-12-25}, booktitle = {Digital Forensics and Cyber Crime}, volume = {157}, pages = {39-52}, publisher = {Springer International Publishing}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter. In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter. In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint. Close http://dx.doi.org/10.1007/978-3-319-25512-5_4 doi:10.1007/978-3-319-25512-5_4 Close
	Baggili, Ibrahim; Oduru, Jeff; Anthony, Kyle; Breitinger, Frank; McGee, Glenn Watch What You Wear: Preliminary Forensic Analysis of Smart Watches (Proceedings Article) In: Availability, Reliability and Security (ARES), 2015 10th International Conference on, pp. 303-311, 2015. (Abstract \| Links \| BibTeX) @inproceedings{BOA15, title = {Watch What You Wear: Preliminary Forensic Analysis of Smart Watches}, author = {Ibrahim Baggili and Jeff Oduru and Kyle Anthony and Frank Breitinger and Glenn McGee}, doi = {10.1109/ARES.2015.39}, year = {2015}, date = {2015-08-27}, booktitle = {Availability, Reliability and Security (ARES), 2015 10th International Conference on}, pages = {303-311}, abstract = {This work presents preliminary forensic analysis of two popular smart watches, the Samsung Gear 2 Neo and LG G. These wearable computing devices have the form factor of watches and sync with smart phones to display notifications, track footsteps and record voice messages. We posit that as smart watches are adopted by more users, the potential for them becoming a haven for digital evidence will increase thus providing utility for this preliminary work. In our work, we examined the forensic artifacts that are left on a Samsung Galaxy S4 Active phone that was used to sync with the Samsung Gear 2 Neo watch and the LG G watch. We further outline a methodology for physically acquiring data from the watches after gaining root access to them. Our results show that we can recover a swath of digital evidence directly form the watches when compared to the data on the phone that is synced with the watches. Furthermore, to root the LG G watch, the watch has to be reset to its factory settings which is alarming because the process may delete data of forensic relevance. Although this method is forensically intrusive, it may be used for acquiring data from already rooted LG watches. It is our observation that the data at the core of the functionality of at least the two tested smart watches, messages, health and fitness data, e-mails, contacts, events and notifications are accessible directly from the acquired images of the watches, which affirms our claim that the forensic value of evidence from smart watches is worthy of further study and should be investigated both at a high level and with greater specificity and granularity.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This work presents preliminary forensic analysis of two popular smart watches, the Samsung Gear 2 Neo and LG G. These wearable computing devices have the form factor of watches and sync with smart phones to display notifications, track footsteps and record voice messages. We posit that as smart watches are adopted by more users, the potential for them becoming a haven for digital evidence will increase thus providing utility for this preliminary work. In our work, we examined the forensic artifacts that are left on a Samsung Galaxy S4 Active phone that was used to sync with the Samsung Gear 2 Neo watch and the LG G watch. We further outline a methodology for physically acquiring data from the watches after gaining root access to them. Our results show that we can recover a swath of digital evidence directly form the watches when compared to the data on the phone that is synced with the watches. Furthermore, to root the LG G watch, the watch has to be reset to its factory settings which is alarming because the process may delete data of forensic relevance. Although this method is forensically intrusive, it may be used for acquiring data from already rooted LG watches. It is our observation that the data at the core of the functionality of at least the two tested smart watches, messages, health and fitness data, e-mails, contacts, events and notifications are accessible directly from the acquired images of the watches, which affirms our claim that the forensic value of evidence from smart watches is worthy of further study and should be investigated both at a high level and with greater specificity and granularity. Close doi:10.1109/ARES.2015.39 Close
	Rathgeb, Christian; Breitinger, Frank; Baier, Harald; Busch, Christoph Towards Bloom filter-based indexing of iris biometric data (Proceedings Article) In: Biometrics (ICB), 2015 International Conference on, pp. 422–429, IEEE 2015, (bf Siew-Sngiem Best Poster Award). (Abstract \| Links \| BibTeX) @inproceedings{7139105, title = {Towards Bloom filter-based indexing of iris biometric data}, author = {Christian Rathgeb and Frank Breitinger and Harald Baier and Christoph Busch}, doi = {10.1109/ICB.2015.7139105}, year = {2015}, date = {2015-05-22}, booktitle = {Biometrics (ICB), 2015 International Conference on}, pages = {422–429}, organization = {IEEE}, abstract = {Conventional biometric identification systems require exhaustive 1 : N comparisons in order to identify a bio- metric probe, i.e. comparison time frequently dominates the overall computational workload. Biometric database indexing represents a challenging task since biometric data does not exhibit any natural sorting order. In this paper we present a preliminary study on the feasibility of applying Bloom filters for the purpose of iris biometric database indexing. It is shown that, by constructing a binary tree data structure of Bloom filters extracted from binary iris biometric templates (iris-codes), the search space can be reduced to O(log N ). In experiments, which are carried out on a medium-sized database of N = 256 subjects, biometric performance (accuracy) is maintained for different conventional identification systems. Further, perspectives on how to employ the proposed scheme on large-scale databases are given.}, note = {bf Siew-Sngiem Best Poster Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Conventional biometric identification systems require exhaustive 1 : N comparisons in order to identify a bio- metric probe, i.e. comparison time frequently dominates the overall computational workload. Biometric database indexing represents a challenging task since biometric data does not exhibit any natural sorting order. In this paper we present a preliminary study on the feasibility of applying Bloom filters for the purpose of iris biometric database indexing. It is shown that, by constructing a binary tree data structure of Bloom filters extracted from binary iris biometric templates (iris-codes), the search space can be reduced to O(log N ). In experiments, which are carried out on a medium-sized database of N = 256 subjects, biometric performance (accuracy) is maintained for different conventional identification systems. Further, perspectives on how to employ the proposed scheme on large-scale databases are given. Close doi:10.1109/ICB.2015.7139105 Close
	Satyendra, Gurjar; Baggili, Ibrahim; Breitinger, Frank; Fischer, Alice An empirical comparison of widely adopted hash functions in digital forensics: does the programming language and operating system make a difference? (Proceedings Article) In: Proceedings of the Conference on Digital Forensics, Security and Law, pp. 57–68, 2015. (Abstract \| Links \| BibTeX) @inproceedings{SBBF15, title = {An empirical comparison of widely adopted hash functions in digital forensics: does the programming language and operating system make a difference?}, author = {Gurjar Satyendra and Ibrahim Baggili and Frank Breitinger and Alice Fischer}, url = {https://commons.erau.edu/adfsl/2015/tuesday/6/}, year = {2015}, date = {2015-05-19}, booktitle = {Proceedings of the Conference on Digital Forensics, Security and Law}, pages = {57–68}, abstract = {Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that need to be processed in cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing functionality. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that need to be processed in cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing functionality. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512. Close https://commons.erau.edu/adfsl/2015/tuesday/6/ Close
	Baggili, Ibrahim; Breitinger, Frank Data Sources for Advancing Cyber Forensics: What the Social World Has to Offer (Proceedings Article) In: AAAI Spring Symposium Series, 2015. (Abstract \| Links \| BibTeX) @inproceedings{BB15, title = {Data Sources for Advancing Cyber Forensics: What the Social World Has to Offer}, author = {Ibrahim Baggili and Frank Breitinger}, url = {http://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10227}, year = {2015}, date = {2015-03-12}, booktitle = {AAAI Spring Symposium Series}, abstract = {Cyber forensics is fairly new as a scientific discipline and deals with the acquisition, authentication and analysis of digital evidence. One of the biggest challenges in this domain has thus far been real data sources that are available for experimentation. Only a few data sources exist at the time writing of this paper. The authors in this paper deliberate how social media data sources may impact future directions in cyber forensics, and describe how these data sources may be used as new digital forensic artifacts in future investigations. The authors also deliberate how the scientific community may leverage publically accessible social media data to advance the state of the art in Cyber Forensics.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cyber forensics is fairly new as a scientific discipline and deals with the acquisition, authentication and analysis of digital evidence. One of the biggest challenges in this domain has thus far been real data sources that are available for experimentation. Only a few data sources exist at the time writing of this paper. The authors in this paper deliberate how social media data sources may impact future directions in cyber forensics, and describe how these data sources may be used as new digital forensic artifacts in future investigations. The authors also deliberate how the scientific community may leverage publically accessible social media data to advance the state of the art in Cyber Forensics. Close http://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10227 Close
2014
	Breitinger, Frank; Liu, Huajian; Winter, Christian; Baier, Harald; Rybalchenko, Alexey; Steinebach, Martin Towards a Process Model for Hash Functions in Digital Forensics (Proceedings Article) In: Gladyshev, Pavel; Marrington, Andrew; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 170-186, Springer International Publishing, 2014, ISBN: 978-3-319-14288-3. (Abstract \| Links \| BibTeX) @inproceedings{BLW14, title = {Towards a Process Model for Hash Functions in Digital Forensics}, author = {Frank Breitinger and Huajian Liu and Christian Winter and Harald Baier and Alexey Rybalchenko and Martin Steinebach}, editor = {Pavel Gladyshev and Andrew Marrington and Ibrahim Baggili}, url = {http://dx.doi.org/10.1007/978-3-319-14289-0_12}, doi = {10.1007/978-3-319-14289-0_12}, isbn = {978-3-319-14288-3}, year = {2014}, date = {2014-12-23}, booktitle = {Digital Forensics and Cyber Crime}, volume = {132}, pages = {170-186}, publisher = {Springer International Publishing}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting). One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device. First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting). One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device. First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model. Close http://dx.doi.org/10.1007/978-3-319-14289-0_12 doi:10.1007/978-3-319-14289-0_12 Close
	Breitinger, Frank; Ziroff, Georg; Lange, Steffen; Baier, Harald Similarity Hashing Based on Levenshtein Distances (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 133-147, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6. (Abstract \| Links \| BibTeX) @inproceedings{BZLB14, title = {Similarity Hashing Based on Levenshtein Distances}, author = {Frank Breitinger and Georg Ziroff and Steffen Lange and Harald Baier}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-662-44952-3_10}, doi = {10.1007/978-3-662-44952-3_10}, isbn = {978-3-662-44951-6}, year = {2014}, date = {2014-01-01}, booktitle = {Advances in Digital Forensics X}, volume = {433}, pages = {133-147}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file. This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as sdhash.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file. This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as sdhash. Close http://dx.doi.org/10.1007/978-3-662-44952-3_10 doi:10.1007/978-3-662-44952-3_10 Close
	Breitinger, Frank; Winter, Christian; Yannikos, York; Fink, Tobias; Seefried, Michael Using Approximate Matching to Reduce the Volume of Digital Data (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 149-163, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6. (Abstract \| Links \| BibTeX) @inproceedings{BWY14, title = {Using Approximate Matching to Reduce the Volume of Digital Data}, author = {Frank Breitinger and Christian Winter and York Yannikos and Tobias Fink and Michael Seefried}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-662-44952-3_11}, doi = {10.1007/978-3-662-44952-3_11}, isbn = {978-3-662-44951-6}, year = {2014}, date = {2014-01-01}, booktitle = {Advances in Digital Forensics X}, volume = {433}, pages = {149-163}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%. Close http://dx.doi.org/10.1007/978-3-662-44952-3_11 doi:10.1007/978-3-662-44952-3_11 Close
2013
	Breitinger, Frank; Baier, Harald Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2 (Proceedings Article) In: Rogers, Marcus; Seigfried-Spellar, KathrynC. (Ed.): Digital Forensics and Cyber Crime, pp. 167-182, Springer Berlin Heidelberg, 2013, ISBN: 978-3-642-39890-2. (Abstract \| Links \| BibTeX) @inproceedings{BB12d, title = {Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2}, author = {Frank Breitinger and Harald Baier}, editor = {Marcus Rogers and KathrynC. Seigfried-Spellar}, url = {http://dx.doi.org/10.1007/978-3-642-39891-9_11}, doi = {10.1007/978-3-642-39891-9_11}, isbn = {978-3-642-39890-2}, year = {2013}, date = {2013-11-01}, booktitle = {Digital Forensics and Cyber Crime}, volume = {114}, pages = {167-182}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash func- tions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for similarity preserving hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional hash functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of similarity preserving hashing. Additionally, we extend the algorithm MRSH for similarity preserving hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash func- tions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for similarity preserving hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional hash functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of similarity preserving hashing. Additionally, we extend the algorithm MRSH for similarity preserving hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file. Close http://dx.doi.org/10.1007/978-3-642-39891-9_11 doi:10.1007/978-3-642-39891-9_11 Close
	Rathgeb, Christian; Breitinger, Frank; Busch, Christoph Alignment-free cancelable iris biometric templates based on adaptive bloom filters (Proceedings Article) In: Biometrics (ICB), 2013 International Conference on, pp. 1-8, 2013. (Abstract \| Links \| BibTeX) @inproceedings{RBB13, title = {Alignment-free cancelable iris biometric templates based on adaptive bloom filters}, author = {Christian Rathgeb and Frank Breitinger and Christoph Busch}, url = {http://dx.doi.org/10.1109/ICB.2013.6612976}, doi = {10.1109/ICB.2013.6612976}, year = {2013}, date = {2013-09-30}, booktitle = {Biometrics (ICB), 2013 International Conference on}, pages = {1-8}, abstract = {Biometric characteristics are largely immutable, i.e. unprotected storage of biometric data provokes serious privacy threats, e.g. identity theft, limited re-newability, or cross-matching. In accordance with the ISO/IEC 24745 standard, technologies of cancelable biometrics offer solutions to biometric information protection by obscuring biometric signal in a non-invertible manner, while biometric comparisons are still feasible in the transformed domain. In the presented work alignment-free cancelable iris biometrics based on adaptive Bloom filters are proposed. Bloom filter-based representations of binary biometric templates (iris-codes) enable an efficient alignment-invariant biometric comparison while a successive mapping of parts of a binary biometric template to a Bloom filter represents an irreversible transform. In experiments, which are carried out on the CASIA - v 3 iris database, it is demonstrated that the proposed system maintains biometric performance for diverse iris recognition algorithms, protecting biometric templates at high security levels.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Biometric characteristics are largely immutable, i.e. unprotected storage of biometric data provokes serious privacy threats, e.g. identity theft, limited re-newability, or cross-matching. In accordance with the ISO/IEC 24745 standard, technologies of cancelable biometrics offer solutions to biometric information protection by obscuring biometric signal in a non-invertible manner, while biometric comparisons are still feasible in the transformed domain. In the presented work alignment-free cancelable iris biometrics based on adaptive Bloom filters are proposed. Bloom filter-based representations of binary biometric templates (iris-codes) enable an efficient alignment-invariant biometric comparison while a successive mapping of parts of a binary biometric template to a Bloom filter represents an irreversible transform. In experiments, which are carried out on the CASIA - v 3 iris database, it is demonstrated that the proposed system maintains biometric performance for diverse iris recognition algorithms, protecting biometric templates at high security levels. Close http://dx.doi.org/10.1109/ICB.2013.6612976 doi:10.1109/ICB.2013.6612976 Close
	Breitinger, Frank; Astebøl, Knut; Baier, Harald; Busch, Christoph mvHash-B - A New Approach for Similarity Preserving Hashing (Proceedings Article) In: IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on, pp. 33-44, 2013. (Abstract \| Links \| BibTeX) @inproceedings{BABB13, title = {mvHash-B - A New Approach for Similarity Preserving Hashing}, author = {Frank Breitinger and Knut Astebøl and Harald Baier and Christoph Busch}, url = {http://dx.doi.org/10.1109/IMF.2013.18}, doi = {10.1109/IMF.2013.18}, year = {2013}, date = {2013-07-25}, booktitle = {IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on}, pages = {33-44}, abstract = {The handling of hundreds of thousands of files is a major challenge in today's IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The handling of hundreds of thousands of files is a major challenge in today's IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes. Close http://dx.doi.org/10.1109/IMF.2013.18 doi:10.1109/IMF.2013.18 Close
	Breitinger, Frank; Petrov, Kaloyan Reducing the Time Required for Hashing Operations (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics IX, pp. 101-117, Springer Berlin Heidelberg, 2013, ISBN: 978-3-642-41147-2. (Abstract \| Links \| BibTeX) @inproceedings{BK13, title = {Reducing the Time Required for Hashing Operations}, author = {Frank Breitinger and Kaloyan Petrov}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-642-41148-9_7}, doi = {10.1007/978-3-642-41148-9_7}, isbn = {978-3-642-41147-2}, year = {2013}, date = {2013-01-01}, booktitle = {Advances in Digital Forensics IX}, volume = {410}, pages = {101-117}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading. Close http://dx.doi.org/10.1007/978-3-642-41148-9_7 doi:10.1007/978-3-642-41148-9_7 Close
2012
	Breitinger, Frank; Baier, Harald Performance Issues About Context-Triggered Piecewise Hashing (Proceedings Article) In: Gladyshev, Pavel; Rogers, MarcusK. (Ed.): Digital Forensics and Cyber Crime, pp. 141-155, Springer Berlin Heidelberg, 2012, ISBN: 978-3-642-35514-1. (Abstract \| Links \| BibTeX) @inproceedings{BB12a, title = {Performance Issues About Context-Triggered Piecewise Hashing}, author = {Frank Breitinger and Harald Baier}, editor = {Pavel Gladyshev and MarcusK. Rogers}, url = {http://dx.doi.org/10.1007/978-3-642-35515-8_12}, doi = {10.1007/978-3-642-35515-8_12}, isbn = {978-3-642-35514-1}, year = {2012}, date = {2012-12-01}, booktitle = {Digital Forensics and Cyber Crime}, volume = {88}, pages = {141-155}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach. Close http://dx.doi.org/10.1007/978-3-642-35515-8_12 doi:10.1007/978-3-642-35515-8_12 Close
	Breitinger, Frank; Baier, Harald Properties of a similarity preserving hash function and their realization in sdhash (Proceedings Article) In: Information Security for South Africa (ISSA), pp. 1-8, 2012. (Abstract \| Links \| BibTeX) @inproceedings{BB12c, title = {Properties of a similarity preserving hash function and their realization in sdhash}, author = {Frank Breitinger and Harald Baier}, url = {http://dx.doi.org/10.1109/ISSA.2012.6320445}, doi = {10.1109/ISSA.2012.6320445}, year = {2012}, date = {2012-10-04}, booktitle = {Information Security for South Africa (ISSA)}, pages = {1-8}, abstract = {Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved. Close http://dx.doi.org/10.1109/ISSA.2012.6320445 doi:10.1109/ISSA.2012.6320445 Close
	Breitinger, Frank; Baier, Harald; Beckingham, Jesse Security and implementation analysis of the similarity digest sdhash (Proceedings Article) In: First International Baltic Conference on Network Security & Forensics (NeSeFo), 2012. (BibTeX) @inproceedings{BBB12, title = {Security and implementation analysis of the similarity digest sdhash}, author = {Frank Breitinger and Harald Baier and Jesse Beckingham}, year = {2012}, date = {2012-08-01}, booktitle = {First International Baltic Conference on Network Security & Forensics (NeSeFo)}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close
	Breitinger, Frank; Baier, Harald A fuzzy hashing approach based on random sequences and hamming distance (Proceedings Article) In: Proceedings of the Conference on Digital Forensics, Security and Law, pp. 89–100, 2012. (Abstract \| Links \| BibTeX) @inproceedings{BB12b, title = {A fuzzy hashing approach based on random sequences and hamming distance}, author = {Frank Breitinger and Harald Baier}, url = {https://commons.erau.edu/cgi/viewcontent.cgi?article=1193&context=adfsl}, year = {2012}, date = {2012-05-01}, booktitle = {Proceedings of the Conference on Digital Forensics, Security and Law}, pages = {89–100}, abstract = {Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to `rebuild' an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file's bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to `rebuild' an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file's bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures. Close https://commons.erau.edu/cgi/viewcontent.cgi?article=1193&context=adfsl Close
2011
	Baier, Harald; Breitinger, Frank Security Aspects of Piecewise Hashing in Computer Forensics (Proceedings Article) In: IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on, pp. 21-36, 2011. (Abstract \| Links \| BibTeX) @inproceedings{BB11, title = {Security Aspects of Piecewise Hashing in Computer Forensics}, author = {Harald Baier and Frank Breitinger}, url = {http://dx.doi.org/10.1109/IMF.2011.16}, doi = {10.1109/IMF.2011.16}, year = {2011}, date = {2011-06-17}, booktitle = {IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on}, pages = {21-36}, abstract = {Although hash functions are a well-known method in computer science to map arbitrary large data to bit strings of a fixed length, their use in computer forensics is currently very limited. As of today, in a pre-step process hash values of files are generated and stored in a database, typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. This approach has several drawbacks, which have been sketched in the community, and some alternative approaches have been proposed. The most popular one is due to Jesse Kornblum, who transferred ideas from spam detection to computer forensics in order to identify similar files. However, his proposal lacks a thorough security analysis. It is therefore one aim of the paper at hand to present some possible attack vectors of an active adversary to bypass Kornblum's approach. Furthermore, we present a pseudo random number generator being both more efficient and more random compared to Kornblum's pseudo random number generator.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Although hash functions are a well-known method in computer science to map arbitrary large data to bit strings of a fixed length, their use in computer forensics is currently very limited. As of today, in a pre-step process hash values of files are generated and stored in a database, typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. This approach has several drawbacks, which have been sketched in the community, and some alternative approaches have been proposed. The most popular one is due to Jesse Kornblum, who transferred ideas from spam detection to computer forensics in order to identify similar files. However, his proposal lacks a thorough security analysis. It is therefore one aim of the paper at hand to present some possible attack vectors of an active adversary to bypass Kornblum's approach. Furthermore, we present a pseudo random number generator being both more efficient and more random compared to Kornblum's pseudo random number generator. Close http://dx.doi.org/10.1109/IMF.2011.16 doi:10.1109/IMF.2011.16 Close
2010
	Breitinger, Frank; Nickel, Claudia User Survey on Phone Security and Usage (Proceedings Article) In: Brömme, Arslan; Busch, Christoph (Ed.): BIOSIG, pp. 139-144, GI, 2010, ISBN: 978-3-88579-258-1. (Abstract \| Links \| BibTeX) @inproceedings{BN10, title = {User Survey on Phone Security and Usage}, author = {Frank Breitinger and Claudia Nickel}, editor = {Arslan Brömme and Christoph Busch}, url = {http://dblp.uni-trier.de/db/conf/biosig/biosig2010.html#BreitingerN10}, isbn = {978-3-88579-258-1}, year = {2010}, date = {2010-06-01}, booktitle = {BIOSIG}, volume = {164}, pages = {139-144}, publisher = {GI}, series = {LNI}, abstract = {Mobile phones are widely used nowadays and during the last years devel- oped from simple phones to small computers with an increasing number of features. These result in a wide variety of data stored on the devices which could be a high security risk in case of unauthorized access. A comprehensive user survey was con- ducted to get information about what data is really stored on the mobile devices, how it is currently protected and if biometric authentication methods could improve the cur- rent state. This paper states the results from about 550 users of mobile devices. The analysis revealed a very low securtiy level of the devices. This is partly due to a low security awareness of their owners and partly due to the low acceptance of the offered authentication method based on PIN. Further results like the experiences with mobile thefts and the willingness to use biometric authentication methods as alternative to PIN authentication are also stated.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Mobile phones are widely used nowadays and during the last years devel- oped from simple phones to small computers with an increasing number of features. These result in a wide variety of data stored on the devices which could be a high security risk in case of unauthorized access. A comprehensive user survey was con- ducted to get information about what data is really stored on the mobile devices, how it is currently protected and if biometric authentication methods could improve the cur- rent state. This paper states the results from about 550 users of mobile devices. The analysis revealed a very low securtiy level of the devices. This is partly due to a low security awareness of their owners and partly due to the low acceptance of the offered authentication method based on PIN. Further results like the experiences with mobile thefts and the willingness to use biometric authentication methods as alternative to PIN authentication are also stated. Close http://dblp.uni-trier.de/db/conf/biosig/biosig2010.html#BreitingerN10 Close

2025
	Dorai, Gokila; Rad, Pouria; Breitinger, Frank; Bardhan, Rajon; Ramasamy, Vijayalakshmi Mapping the Research Landscape - An Exploratory Analysis of AI Applications in Digital Forensics (Proceedings Article) In: Coppens, Bart; Volckaert, Bruno; Naessens, Vincent; De Sutter, Bjorn (Ed.): Availability, Reliability and Security, pp. 113–130, Springer Nature Switzerland, Cham, 2025, ISBN: 978-3-032-00635-6. (Abstract \| Links \| BibTeX) @inproceedings{10.1007/978-3-032-00635-6_7, title = {Mapping the Research Landscape - An Exploratory Analysis of AI Applications in Digital Forensics}, author = {Gokila Dorai and Pouria Rad and Frank Breitinger and Rajon Bardhan and Vijayalakshmi Ramasamy}, editor = {Coppens, Bart and Volckaert, Bruno and Naessens, Vincent and De Sutter, Bjorn}, url = {https://doi.org/10.1007/978-3-032-00635-6_7}, doi = {10.1007/978-3-032-00635-6_7}, isbn = {978-3-032-00635-6}, year = {2025}, date = {2025-08-09}, urldate = {2025-08-09}, booktitle = {Availability, Reliability and Security}, pages = {113–130}, publisher = {Springer Nature Switzerland}, address = {Cham}, abstract = {Artificial intelligence (AI) and machine learning (ML) have great potential to enhance digital forensic investigation, but progress is impeded by challenges in building datasets that meet technical accuracy and legal requirements. We herein compile findings from the latest scholarly literature to identify potential key aspects that are required for building forensic datasets that can effectively support AI-based investigative tools. We examine current practices in dataset building, ranging from representativeness of data, quality of annotation, chain-of-custody documentation, and metadata standardization, and consider their effects carefully on training robust AI models. Results point to key shortcomings that impede advanced AI implementations in digital forensics, which form a strong baseline for developing a standard workflow for building forensic datasets. This work, therefore, forms a stepping stone for future projects to enhance investigation capabilities through a better-structured and legally sound process of dataset building.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Artificial intelligence (AI) and machine learning (ML) have great potential to enhance digital forensic investigation, but progress is impeded by challenges in building datasets that meet technical accuracy and legal requirements. We herein compile findings from the latest scholarly literature to identify potential key aspects that are required for building forensic datasets that can effectively support AI-based investigative tools. We examine current practices in dataset building, ranging from representativeness of data, quality of annotation, chain-of-custody documentation, and metadata standardization, and consider their effects carefully on training robust AI models. Results point to key shortcomings that impede advanced AI implementations in digital forensics, which form a strong baseline for developing a standard workflow for building forensic datasets. This work, therefore, forms a stepping stone for future projects to enhance investigation capabilities through a better-structured and legally sound process of dataset building. Close https://doi.org/10.1007/978-3-032-00635-6_7 doi:10.1007/978-3-032-00635-6_7 Close
	Wickramasekara, Akila; Densmore, Alanna; Breitinger, Frank; Studiawan, Hudan; Scanlon, Mark AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation (Proceedings Article) In: Proceedings of the Digital Forensics Doctoral Symposium, Association for Computing Machinery, Brno, CZ, 2025, ISBN: 9798400710766. (Abstract \| Links \| BibTeX) @inproceedings{10.1145/3712716.3712718, title = {AutoDFBench: A Framework for AI Generated Digital Forensic Code and Tool Testing and Evaluation}, author = {Akila Wickramasekara and Alanna Densmore and Frank Breitinger and Hudan Studiawan and Mark Scanlon}, url = {https://doi.org/10.1145/3712716.3712718}, doi = {10.1145/3712716.3712718}, isbn = {9798400710766}, year = {2025}, date = {2025-04-01}, urldate = {2025-01-01}, booktitle = {Proceedings of the Digital Forensics Doctoral Symposium}, publisher = {Association for Computing Machinery}, address = {Brno, CZ}, series = {DFDS '25}, abstract = {Generative AI (GenAI) and Large Language Models (LLMs) show great potential in various domains, including digital forensics. A notable use case of these technologies is automatic code generation, which can reasonably be expected to include digital forensic applications in the not-too-distant future. As with any digital forensic tool, these systems must undergo extensive testing and validation. However, manually evaluating outputs, including generated DF code, remains a challenge. AutoDFBench is an automated framework designed to address this by validating AI-generated code and tools against NIST’s Computer Forensics Tool Testing Program (CFTT) procedures and subsequently calculating an AutoDFBench benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. It benchmarks generative AI systems, such as LLMs and automated code generation agents, for DF applications. This benchmark can support iterative development or serve as a comparison metric between GenAI DF systems. As a proof of concept, NIST’s forensic string search tests were used, involving more than 24,200 tests with five top-performing code generation LLMs. These tests validated the output of 121 cases, considering two levels of user expertise, two programming languages, and ten iterations per case with varying prompts. The results also highlight the significant limitations of the DF-specific solutions generated by generic LLMs.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Generative AI (GenAI) and Large Language Models (LLMs) show great potential in various domains, including digital forensics. A notable use case of these technologies is automatic code generation, which can reasonably be expected to include digital forensic applications in the not-too-distant future. As with any digital forensic tool, these systems must undergo extensive testing and validation. However, manually evaluating outputs, including generated DF code, remains a challenge. AutoDFBench is an automated framework designed to address this by validating AI-generated code and tools against NIST’s Computer Forensics Tool Testing Program (CFTT) procedures and subsequently calculating an AutoDFBench benchmarking score. The framework operates in four phases: data preparation, API handling, code execution, and result recording with score calculation. It benchmarks generative AI systems, such as LLMs and automated code generation agents, for DF applications. This benchmark can support iterative development or serve as a comparison metric between GenAI DF systems. As a proof of concept, NIST’s forensic string search tests were used, involving more than 24,200 tests with five top-performing code generation LLMs. These tests validated the output of 121 cases, considering two levels of user expertise, two programming languages, and ten iterations per case with varying prompts. The results also highlight the significant limitations of the DF-specific solutions generated by generic LLMs. Close https://doi.org/10.1145/3712716.3712718 doi:10.1145/3712716.3712718 Close
	Vanini, Céline; Gruber, Jan; Hargreaves, Christopher; Benenson, Zinaida; Freiling, Felix; Breitinger, Frank Understanding Strategies and Challenges of Timestamp Tampering for Improved Digital Forensic Event Reconstruction (Proceedings Article) In: Proceedings of the Digital Forensics Doctoral Symposium, Association for Computing Machinery, Brno, CZ, 2025, ISBN: 9798400710766. (Abstract \| Links \| BibTeX) @inproceedings{10.1145/3712716.3712727, title = {Understanding Strategies and Challenges of Timestamp Tampering for Improved Digital Forensic Event Reconstruction}, author = {Céline Vanini and Jan Gruber and Christopher Hargreaves and Zinaida Benenson and Felix Freiling and Frank Breitinger}, url = {https://doi.org/10.1145/3712716.3712727}, doi = {10.1145/3712716.3712727}, isbn = {9798400710766}, year = {2025}, date = {2025-04-01}, urldate = {2025-01-01}, booktitle = {Proceedings of the Digital Forensics Doctoral Symposium}, publisher = {Association for Computing Machinery}, address = {Brno, CZ}, series = {DFDS '25}, abstract = {Timestamps play a pivotal role in digital forensic event reconstruction, but due to their non-essential nature, tampering or manipulation of timestamps is possible by users in multiple ways, even on running systems. This has a significant effect on the reliability of the results from applying a timeline analysis as part of an investigation. We investigate the problem of users tampering with timestamps on a running (“live”) system. While prior work has shown that digital evidence tampering is hard, we focus on the question of why this is so. By performing a qualitative user study with advanced university students, we derive factors that influence the reliability of successful tampering, such as the individual knowledge about temporal traces, and technical restrictions to change them. These insights help to assess the reliability of timestamps from individual artifacts that are used for event reconstruction and subsequently reduce the risk of misinterpretations.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Timestamps play a pivotal role in digital forensic event reconstruction, but due to their non-essential nature, tampering or manipulation of timestamps is possible by users in multiple ways, even on running systems. This has a significant effect on the reliability of the results from applying a timeline analysis as part of an investigation. We investigate the problem of users tampering with timestamps on a running (“live”) system. While prior work has shown that digital evidence tampering is hard, we focus on the question of why this is so. By performing a qualitative user study with advanced university students, we derive factors that influence the reliability of successful tampering, such as the individual knowledge about temporal traces, and technical restrictions to change them. These insights help to assess the reliability of timestamps from individual artifacts that are used for event reconstruction and subsequently reduce the risk of misinterpretations. Close https://doi.org/10.1145/3712716.3712727 doi:10.1145/3712716.3712727 Close
	Michelet, Gaëtan; Breitinger, Frank Automation for digital forensics: Towards a classification model for the community (Proceedings Article) In: Proceedings of the Digital Forensics Doctoral Symposium, Association for Computing Machinery, Brno, CZ, 2025, ISBN: 9798400710766. (Abstract \| Links \| BibTeX) @inproceedings{10.1145/3712716.3712725, title = {Automation for digital forensics: Towards a classification model for the community}, author = {Gaëtan Michelet and Frank Breitinger}, url = {https://doi.org/10.1145/3712716.3712725}, doi = {10.1145/3712716.3712725}, isbn = {9798400710766}, year = {2025}, date = {2025-04-01}, booktitle = {Proceedings of the Digital Forensics Doctoral Symposium}, publisher = {Association for Computing Machinery}, address = {Brno, CZ}, series = {DFDS '25}, abstract = {The current state of automation in digital forensics remains insufficiently defined. While the complexity of automated tools and methods has evolved significantly (e.g., from basic parsers to the integration of advanced techniques), it remains challenging to pinpoint the field’s overall progress or compare methods. A first step towards a solution was the work ‘Automation for digital forensics: Towards a definition for the community’ which defines automation but cannot categorize various methods. This work aims to address this gap and presents a first classification model for automation for digital forensics. Therefore, we analyzed automation classification schemes from different disciplines (e.g., cars) and assessed various model possibilities as well as characteristics. We conclude that a 2-dimensional model with the axis ‘decision’ and ‘level of automation’ is most appropriate and provide an overview table with examples.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The current state of automation in digital forensics remains insufficiently defined. While the complexity of automated tools and methods has evolved significantly (e.g., from basic parsers to the integration of advanced techniques), it remains challenging to pinpoint the field’s overall progress or compare methods. A first step towards a solution was the work ‘Automation for digital forensics: Towards a definition for the community’ which defines automation but cannot categorize various methods. This work aims to address this gap and presents a first classification model for automation for digital forensics. Therefore, we analyzed automation classification schemes from different disciplines (e.g., cars) and assessed various model possibilities as well as characteristics. We conclude that a 2-dimensional model with the axis ‘decision’ and ‘level of automation’ is most appropriate and provide an overview table with examples. Close https://doi.org/10.1145/3712716.3712725 doi:10.1145/3712716.3712725 Close
2023
	Ottmann, Jenny; Cengiz, Üsame; Breitinger, Frank; Freiling, Felix As if Time Had Stopped – Checking Memory Dumps for Quasi-Instantaneous Consistency (Proceedings Article) In: Proceedings of the Digital Forensics Research Conference USA (DFRWS USA), 2023. (Abstract \| Links \| BibTeX) @inproceedings{OUBF2023, title = {As if Time Had Stopped – Checking Memory Dumps for Quasi-Instantaneous Consistency}, author = {Jenny Ottmann and Üsame Cengiz and Frank Breitinger and Felix Freiling}, url = {https://dfrws.org/presentation/as-if-time-had-stopped-checking-memory-dumps-for-quasi-instantaneous-consistency/}, doi = {10.48550/arXiv.2307.12060}, year = {2023}, date = {2023-07-10}, booktitle = {Proceedings of the Digital Forensics Research Conference USA (DFRWS USA)}, abstract = {Memory dumps that are acquired while the system is running often contain inconsistencies like page smearing which hamper the analysis. One possibility to avoid inconsistencies is to pause the system during the acquisition and take an instantaneous memory dump. While this is possible for virtual machines, most systems cannot be frozen and thus the ideal dump can only be quasi-instantaneous, i.e., consistent despite the system running. In this article, we introduce a method allowing us to measure quasi-instantaneous consistency and show both, theoretically, and practically, that our method is valid but that in reality, dumps can be but usually are not quasi-instantaneously consistent. For the assessment, we run a pivot program enabling the evaluation of quasi-instantaneous consistency for its heap and allowing us to pinpoint where exactly inconsistencies occurred.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Memory dumps that are acquired while the system is running often contain inconsistencies like page smearing which hamper the analysis. One possibility to avoid inconsistencies is to pause the system during the acquisition and take an instantaneous memory dump. While this is possible for virtual machines, most systems cannot be frozen and thus the ideal dump can only be quasi-instantaneous, i.e., consistent despite the system running. In this article, we introduce a method allowing us to measure quasi-instantaneous consistency and show both, theoretically, and practically, that our method is valid but that in reality, dumps can be but usually are not quasi-instantaneously consistent. For the assessment, we run a pivot program enabling the evaluation of quasi-instantaneous consistency for its heap and allowing us to pinpoint where exactly inconsistencies occurred. Close https://dfrws.org/presentation/as-if-time-had-stopped-checking-memory-dumps-for-[...] doi:10.48550/arXiv.2307.12060 Close
2022
	Coates, Peter; Breitinger, Frank Identifying document similarity using a fast estimation of the Levenshtein Distance based on compression and signatures (Proceedings Article) In: Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU), 2022. (Abstract \| Links \| BibTeX) @inproceedings{CB2022, title = {Identifying document similarity using a fast estimation of the Levenshtein Distance based on compression and signatures}, author = {Peter Coates and Frank Breitinger}, url = {https://www.researchgate.net/publication/359961968_Identifying_document_similarity_using_a_fast_estimation_of_the_Levenshtein_Distance_based_on_compression_and_signatures}, doi = {10.48550/arXiv.2307.11496}, year = {2022}, date = {2022-03-31}, booktitle = {Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU)}, abstract = {Identifying document similarity has many applications, e.g., source code analysis or plagiarism detection. However, identifying similarities is not trivial and can be time complex. For instance, the Levenshtein Distance is a common metric to define the similarity between two documents but has quadratic runtime which makes it impractical for large documents where large starts with a few hundred kilobytes. In this paper, we present a novel concept that allows estimating the Levenshtein Distance: the algorithm first compresses documents to signatures (similar to hash values) using a user-defined compression ratio. Signatures can then be compared against each other (some constrains apply) where the outcome is the estimated Levenshtein Distance. Our evaluation shows promising results in terms of runtime efficiency and accuracy. In addition, we introduce a significance score allowing examiners to set a threshold and identify related documents.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Identifying document similarity has many applications, e.g., source code analysis or plagiarism detection. However, identifying similarities is not trivial and can be time complex. For instance, the Levenshtein Distance is a common metric to define the similarity between two documents but has quadratic runtime which makes it impractical for large documents where large starts with a few hundred kilobytes. In this paper, we present a novel concept that allows estimating the Levenshtein Distance: the algorithm first compresses documents to signatures (similar to hash values) using a user-defined compression ratio. Signatures can then be compared against each other (some constrains apply) where the outcome is the estimated Levenshtein Distance. Our evaluation shows promising results in terms of runtime efficiency and accuracy. In addition, we introduce a significance score allowing examiners to set a threshold and identify related documents. Close https://www.researchgate.net/publication/359961968_Identifying_document_similari[...] doi:10.48550/arXiv.2307.11496 Close
	Ottmann, Jenny; Breitinger, Frank; Freiling, Felix Defining Atomicity (and Integrity) for Snapshots of Storage in Forensic Computing (Proceedings Article) In: Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU), 2022. (Abstract \| Links \| BibTeX) @inproceedings{OBF2022, title = {Defining Atomicity (and Integrity) for Snapshots of Storage in Forensic Computing}, author = {Jenny Ottmann and Frank Breitinger and Felix Freiling}, url = {https://www.researchgate.net/publication/359962048_Defining_Atomicity_and_Integrity_for_Snapshots_of_Storage_in_Forensic_Computing}, year = {2022}, date = {2022-03-31}, booktitle = {Proceedings of the Digital Forensics Research Conference Europe (DFRWS EU)}, abstract = {The acquisition of data from main memory or from hard disk storage is usually one of the first steps in a forensic investigation. We revisit the discussion on quality criteria for ``forensically sound'' acquisition of such storage and propose a new way to capture the intent to acquire an instantaneous snapshot from a single target system. The idea of our definition is to allow a certain flexibility into when individual portions of memory are acquired, but at the same time require being consistent with causality (i.e., cause/effect relations). Our concept is much stronger than the original notion of atomicity defined by Vömel and Freiling (2012) but still attainable using copy-on-write mechanisms. As a minor result, we also fix a conceptual problem within the original definition of integrity.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The acquisition of data from main memory or from hard disk storage is usually one of the first steps in a forensic investigation. We revisit the discussion on quality criteria for ``forensically sound'' acquisition of such storage and propose a new way to capture the intent to acquire an instantaneous snapshot from a single target system. The idea of our definition is to allow a certain flexibility into when individual portions of memory are acquired, but at the same time require being consistent with causality (i.e., cause/effect relations). Our concept is much stronger than the original notion of atomicity defined by Vömel and Freiling (2012) but still attainable using copy-on-write mechanisms. As a minor result, we also fix a conceptual problem within the original definition of integrity. Close https://www.researchgate.net/publication/359962048_Defining_Atomicity_and_Integr[...] Close
2019
	Moia, Vitor Hugo Galhardo; Breitinger, Frank; Henriques, Marco Aurélio Amaral Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations (Proceedings Article) In: XIX Brazilian Symposium on information and computational systems security, Brazilian Computer Society (SBC) SÃpounds o Paulo-SP, Brazil 2019, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{MBH19, title = {Understanding the effects of removing common blocks on Approximate Matching scores under different scenarios for digital forensic investigations}, author = {Vitor Hugo Galhardo Moia and Frank Breitinger and Marco Aurélio Amaral Henriques}, url = {https://sol.sbc.org.br/index.php/sbseg/article/download/13966/13815}, year = {2019}, date = {2019-09-05}, booktitle = {XIX Brazilian Symposium on information and computational systems security}, organization = {Brazilian Computer Society (SBC) SÃpounds o Paulo-SP, Brazil}, abstract = {Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different files regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be filtered out and that a new interpretation of the score allows a better similarity detection.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Finding similarity in digital forensics investigations can be assisted with the use of Approximate Matching (AM) functions. These algorithms create small and compact representations of objects (similar to hashes) which can be compared to identify similarity. However, often results are biased due to common blocks (data structures found in many different files regardless of content). In this paper, we evaluate the precision and recall metrics for AM functions when removing common blocks. In detail, we analyze how the similarity score changes and impacts different investigation scenarios. Results show that many irrelevant matches can be filtered out and that a new interpretation of the score allows a better similarity detection. Close https://sol.sbc.org.br/index.php/sbseg/article/download/13966/13815 Close
	Wu, Tina; Breitinger, Frank; Baggili, Ibrahim IoT Ignorance is Digital Forensics Research Bliss: A Survey to Understand IoT Forensics Definitions, Challenges and Future Research Directions (Proceedings Article) In: Proceedings of the 14th International Conference on Availability, Reliability and Security, pp. 46:1–46:15, ACM, Canterbury, CA, United Kingdom, 2019, ISBN: 978-1-4503-7164-3. (Links \| BibTeX) @inproceedings{WBB19, title = {IoT Ignorance is Digital Forensics Research Bliss: A Survey to Understand IoT Forensics Definitions, Challenges and Future Research Directions}, author = {Tina Wu and Frank Breitinger and Ibrahim Baggili}, url = {http://doi.acm.org/10.1145/3339252.3340504}, doi = {10.1145/3339252.3340504}, isbn = {978-1-4503-7164-3}, year = {2019}, date = {2019-08-25}, booktitle = {Proceedings of the 14th International Conference on Availability, Reliability and Security}, pages = {46:1–46:15}, publisher = {ACM}, address = {Canterbury, CA, United Kingdom}, series = {ARES '19}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close http://doi.acm.org/10.1145/3339252.3340504 doi:10.1145/3339252.3340504 Close
	Przyborski, Kristen; Breitinger, Frank; Beck, Lauren; Harichandran, Ronald S. `Cyber World' as a Theme for a University-wide First-year Common Course (Proceedings Article) In: 2019 ASEE Annual Conference & Exposition, ASEE Conferences, Tampa, Florida, 2019, (urlhttps://peer.asee.org/31923). (Links \| BibTeX) @inproceedings{Przyborski2019, title = {`Cyber World' as a Theme for a University-wide First-year Common Course}, author = {Kristen Przyborski and Frank Breitinger and Lauren Beck and Ronald S. Harichandran}, doi = {10.18260/1-2–31923}, year = {2019}, date = {2019-06-01}, booktitle = {2019 ASEE Annual Conference & Exposition}, publisher = {ASEE Conferences}, address = {Tampa, Florida}, note = {urlhttps://peer.asee.org/31923}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close doi:10.18260/1-2–31923 Close
2018
	Haigh, Trevor; Breitinger, Frank; Baggili, Ibrahim If I Had a Million Cryptos: Cryptowallet Application Analysis and a Trojan Proof-of-Concept (Proceedings Article) In: Breitinger, Frank; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 45–65, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-05487-8, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{HBB19, title = {If I Had a Million Cryptos: Cryptowallet Application Analysis and a Trojan Proof-of-Concept}, author = {Trevor Haigh and Frank Breitinger and Ibrahim Baggili}, editor = {Frank Breitinger and Ibrahim Baggili}, doi = {10.1007/978-3-030-05487-8_3}, isbn = {978-3-030-05487-8}, year = {2018}, date = {2018-12-30}, booktitle = {Digital Forensics and Cyber Crime}, pages = {45–65}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {Cryptocurrencies have gained wide adoption by enthusiasts and investors. In this work, we examine seven different Android cryptowallet applications for forensic artifacts, but we also assess their security against tampering and reverse engineering. Some of the biggest benefits of cryptocurrency is its security and relative anonymity. For this reason it is vital that wallet applications share the same properties. Our work, however, indicates that this is not the case. Five of the seven applications we tested do not implement basic security measures against reverse engineering. Three of the applications stored sensitive information, like wallet private keys, insecurely and one was able to be decrypted with some effort. One of the applications did not require root access to retrieve the data. We were also able to implement a proof-of-concept trojan which exemplifies how a malicious actor may exploit the lack of security in these applications and exfiltrate user data and cryptocurrency.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cryptocurrencies have gained wide adoption by enthusiasts and investors. In this work, we examine seven different Android cryptowallet applications for forensic artifacts, but we also assess their security against tampering and reverse engineering. Some of the biggest benefits of cryptocurrency is its security and relative anonymity. For this reason it is vital that wallet applications share the same properties. Our work, however, indicates that this is not the case. Five of the seven applications we tested do not implement basic security measures against reverse engineering. Three of the applications stored sensitive information, like wallet private keys, insecurely and one was able to be decrypted with some effort. One of the applications did not require root access to retrieve the data. We were also able to implement a proof-of-concept trojan which exemplifies how a malicious actor may exploit the lack of security in these applications and exfiltrate user data and cryptocurrency. Close doi:10.1007/978-3-030-05487-8_3 Close
	Schmicker, Robert; Breitinger, Frank; Baggili, Ibrahim AndroParse - An Android Feature Extraction Framework and Dataset (Proceedings Article) In: Breitinger, Frank; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 66–88, Springer International Publishing, Cham, 2018, ISBN: 978-3-030-05487-8. (Abstract \| Links \| BibTeX) @inproceedings{SBB19, title = {AndroParse - An Android Feature Extraction Framework and Dataset}, author = {Robert Schmicker and Frank Breitinger and Ibrahim Baggili}, editor = {Frank Breitinger and Ibrahim Baggili}, doi = {10.1007/978-3-030-05487-8_4}, isbn = {978-3-030-05487-8}, year = {2018}, date = {2018-12-30}, booktitle = {Digital Forensics and Cyber Crime}, pages = {66–88}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Android malware has become a major challenge. As a consequence, practitioners and researchers spend a significant time analyzing Android applications (APK). A common procedure (especially for data scientists) is to extract features such as permissions, APIs or strings which can then be analyzed. Current state of the art tools have three major issues: (1) a single tool cannot extract all the significant features used by scientists and practitioners (2) Current tools are not designed to be extensible and (3) Existing parsers can be timely as they are not runtime efficient or scalable. Therefore, this work presents AndroParse which is an open-source Android parser written in Golang that currently extracts the four most common features: Permissions, APIs, Strings and Intents. AndroParse outputs JSON files as they can easily be used by most major programming languages. Constructing the parser allowed us to create an extensive feature dataset which can be accessed by our independent REST API. Our dataset currently has 67,703 benign and 46,683 malicious APK samples. Close doi:10.1007/978-3-030-05487-8_4 Close
	Luciano, Laoise; Baggili, Ibrahim; Topor, Mateusz; Casey, Peter; Breitinger, Frank Digital Forensics in the Next Five Years (Proceedings Article) In: Proceedings of the 13th International Conference on Availability, Reliability and Security, pp. 46:1–46:14, ACM, Hamburg, Germany, 2018, ISBN: 978-1-4503-6448-5. (Abstract \| Links \| BibTeX) @inproceedings{LBT18, title = {Digital Forensics in the Next Five Years}, author = {Laoise Luciano and Ibrahim Baggili and Mateusz Topor and Peter Casey and Frank Breitinger}, url = {http://doi.acm.org/10.1145/3230833.3232813}, doi = {10.1145/3230833.3232813}, isbn = {978-1-4503-6448-5}, year = {2018}, date = {2018-08-30}, booktitle = {Proceedings of the 13th International Conference on Availability, Reliability and Security}, pages = {46:1–46:14}, publisher = {ACM}, address = {Hamburg, Germany}, series = {ARES 2018}, abstract = {Cyber forensics has encountered major obstacles over the last decade and is at a crossroads. This paper presents data that was obtained during the National Workshop on Redefining Cyber Forensics (NWRCF) on May 23-24, 2017 supported by the National Science Foundation and organized by the University of New Haven. Qualitative and quantitative data were analyzed from twenty-four cyber forensics expert panel members. This work identified important themes that need to be addressed by the community, focusing on (1) where the domain currently is; (2) where it needs to go and; (3) steps needed to improve it. Furthermore, based on the results, we articulate (1) the biggest anticipated challenges the domain will face in the next five years; (2) the most important cyber forensics research opportunities in the next five years and; (3) the most important job-ready skills that need to be addressed by higher education curricula over the next five years. Lastly, we present the key issues and recommendations deliberated by the expert panel. Overall results indicated that a more active and coherent group needs to be formed in the cyber forensics community, with opportunities for continuous reassessment and improvement processes in place.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cyber forensics has encountered major obstacles over the last decade and is at a crossroads. This paper presents data that was obtained during the National Workshop on Redefining Cyber Forensics (NWRCF) on May 23-24, 2017 supported by the National Science Foundation and organized by the University of New Haven. Qualitative and quantitative data were analyzed from twenty-four cyber forensics expert panel members. This work identified important themes that need to be addressed by the community, focusing on (1) where the domain currently is; (2) where it needs to go and; (3) steps needed to improve it. Furthermore, based on the results, we articulate (1) the biggest anticipated challenges the domain will face in the next five years; (2) the most important cyber forensics research opportunities in the next five years and; (3) the most important job-ready skills that need to be addressed by higher education curricula over the next five years. Lastly, we present the key issues and recommendations deliberated by the expert panel. Overall results indicated that a more active and coherent group needs to be formed in the cyber forensics community, with opportunities for continuous reassessment and improvement processes in place. Close http://doi.acm.org/10.1145/3230833.3232813 doi:10.1145/3230833.3232813 Close
	Liebler, Lorenz; Breitinger, Frank mrsh-mem: Approximate Matching on Raw Memory Dumps (Proceedings Article) In: 2018 11th International Conference on IT Security Incident Management IT Forensics (IMF), pp. 47-64, 2018. (Abstract \| Links \| BibTeX) @inproceedings{LB18, title = {mrsh-mem: Approximate Matching on Raw Memory Dumps}, author = {Lorenz Liebler and Frank Breitinger}, doi = {10.1109/IMF.2018.00011}, year = {2018}, date = {2018-05-09}, booktitle = {2018 11th International Conference on IT Security Incident Management IT Forensics (IMF)}, pages = {47-64}, abstract = {This paper presents the fusion of two subdomains of digital forensics: (1) raw memory analysis and (2) approximate matching. Specifically, this paper describes a prototype implementation named MRSH-MEM that allows to compare hard drive images as well as memory dumps and therefore can answer the question if a particular program (installed on a hard drive) is currently running / loaded in memory. To answer this question, we only require both dumps or access to a public repository which provides the binaries to be tested. For our prototype, we modified an existing approximate matching algorithm named MRSH-NET and combined it with approxis, an approximate disassembler. Recent literature claims that approximate matching techniques are slow and hardly applicable to the field of memory forensics. Especially legitimate changes to executables in memory caused by the loader itself prevent the application of current bytewise approximate matching techniques. Our approach lowers the impact of modified code in memory and shows a good computational performance. During our experiments, we show how an investigator can leverage meaningful insights by combining data gained from a hard disk image and raw memory dumps with a practicability runtime performance. Lastly, our current implementation will be integrable into the Volatility memory forensics framework and we introduce new possibilities for providing data driven cross validation functions. Our current proof of concept implementation supports Linux based raw memory dumps.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This paper presents the fusion of two subdomains of digital forensics: (1) raw memory analysis and (2) approximate matching. Specifically, this paper describes a prototype implementation named MRSH-MEM that allows to compare hard drive images as well as memory dumps and therefore can answer the question if a particular program (installed on a hard drive) is currently running / loaded in memory. To answer this question, we only require both dumps or access to a public repository which provides the binaries to be tested. For our prototype, we modified an existing approximate matching algorithm named MRSH-NET and combined it with approxis, an approximate disassembler. Recent literature claims that approximate matching techniques are slow and hardly applicable to the field of memory forensics. Especially legitimate changes to executables in memory caused by the loader itself prevent the application of current bytewise approximate matching techniques. Our approach lowers the impact of modified code in memory and shows a good computational performance. During our experiments, we show how an investigator can leverage meaningful insights by combining data gained from a hard disk image and raw memory dumps with a practicability runtime performance. Lastly, our current implementation will be integrable into the Volatility memory forensics framework and we introduce new possibilities for providing data driven cross validation functions. Our current proof of concept implementation supports Linux based raw memory dumps. Close doi:10.1109/IMF.2018.00011 Close
	Lillis, David; Breitinger, Frank; Scanlon, Mark Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees (Proceedings Article) In: Matoušek, Petr; Schmiedecker, Martin (Ed.): Digital Forensics and Cyber Crime, pp. 144–157, Springer International Publishing, Cham, 2018, ISBN: 978-3-319-73697-6, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{LBS18, title = {Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees}, author = {David Lillis and Frank Breitinger and Mark Scanlon}, editor = {Petr Matoušek and Martin Schmiedecker}, doi = {10.1007/978-3-319-73697-6_11}, isbn = {978-3-319-73697-6}, year = {2018}, date = {2018-01-06}, booktitle = {Digital Forensics and Cyber Crime}, pages = {144–157}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of ``known-illegal'' files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Perhaps the most common task encountered by digital forensic investigators consists of searching through a seized device for pertinent data. Frequently, an investigator will be in possession of a collection of ``known-illegal'' files (e.g. a collection of child pornographic images) and will seek to find whether copies of these are stored on the seized drive. Traditional hash matching techniques can efficiently find files that precisely match. However, these will fail in the case of merged files, embedded files, partial files, or if a file has been changed in any way. Close doi:10.1007/978-3-319-73697-6_11 Close
	Knieriem, Brandon; Zhang, Xiaolu; Levine, Philip; Breitinger, Frank; Baggili, Ibrahim An Overview of the Usage of Default Passwords (Proceedings Article) In: Matoušek, Petr; Schmiedecker, Martin (Ed.): Digital Forensics and Cyber Crime, pp. 195–203, Springer International Publishing, Cham, 2018, ISBN: 978-3-319-73697-6. (Abstract \| Links \| BibTeX) @inproceedings{KZL18, title = {An Overview of the Usage of Default Passwords}, author = {Brandon Knieriem and Xiaolu Zhang and Philip Levine and Frank Breitinger and Ibrahim Baggili}, editor = {Petr Matoušek and Martin Schmiedecker}, doi = {10.1007/978-3-319-73697-6_15}, isbn = {978-3-319-73697-6}, year = {2018}, date = {2018-01-06}, booktitle = {Digital Forensics and Cyber Crime}, pages = {195–203}, publisher = {Springer International Publishing}, address = {Cham}, abstract = {The recent Mirai botnet attack demonstrated the danger of using default passwords and showed it is still a major problem. In this study we investigated several common applications and their password policies. Specifically, we analyzed if these applications: (1) have default passwords or (2) allow the user to set a weak password (i.e., they do not properly enforce a password policy). Our study shows that default passwords are still a significant problem: 61% of applications inspected initially used a default or blank password. When changing the password, 58% allowed a blank password, 35% allowed a weak password of 1 character.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The recent Mirai botnet attack demonstrated the danger of using default passwords and showed it is still a major problem. In this study we investigated several common applications and their password policies. Specifically, we analyzed if these applications: (1) have default passwords or (2) allow the user to set a weak password (i.e., they do not properly enforce a password policy). Our study shows that default passwords are still a significant problem: 61% of applications inspected initially used a default or blank password. When changing the password, 58% allowed a blank password, 35% allowed a weak password of 1 character. Close doi:10.1007/978-3-319-73697-6_15 Close
2017
	Meffert, Christopher; Clark, Devon; Baggili, Ibrahim; Breitinger, Frank Forensic State Acquisition from Internet of Things (FSAIoT): A General Framework and Practical Approach for IoT Forensics Through IoT Device State Acquisition (Proceedings Article) In: Proceedings of the 12th International Conference on Availability, Reliability and Security, pp. 56:1–56:11, ACM, Reggio Calabria, Italy, 2017, ISBN: 978-1-4503-5257-4. (Abstract \| Links \| BibTeX) @inproceedings{MCBB17, title = {Forensic State Acquisition from Internet of Things (FSAIoT): A General Framework and Practical Approach for IoT Forensics Through IoT Device State Acquisition}, author = {Christopher Meffert and Devon Clark and Ibrahim Baggili and Frank Breitinger}, url = {http://doi.acm.org/10.1145/3098954.3104053}, doi = {10.1145/3098954.3104053}, isbn = {978-1-4503-5257-4}, year = {2017}, date = {2017-09-01}, booktitle = {Proceedings of the 12th International Conference on Availability, Reliability and Security}, pages = {56:1–56:11}, publisher = {ACM}, address = {Reggio Calabria, Italy}, series = {ARES '17}, abstract = {IoT device forensics is a difficult problem given that manufactured IoT devices are not standardized, many store little to no historical data, and are always connected; making them extremely volatile. The goal of this paper was to address these challenges by presenting a primary account for a general framework and practical approach we term Forensic State Acquisition from Internet of Things (FSAIoT). We argue that by leveraging the acquisition of the state of IoT devices (e.g. if an IoT lock is open or locked), it becomes possible to paint a clear picture of events that have occurred. To this end, FSAIoT consists of a centralized Forensic State Acquisition Controller (FSAC) employed in three state collection modes: controller to IoT device, controller to cloud, and controller to controller. We present a proof of concept implementation using openHAB – a device agnostic open source IoT device controller – and self-created scripts, to resemble a FSAC implementation. Our proof of concept employed an Insteon IP Camera as a controller to device test, an Insteon Hub as a controller to controller test, and a nest thermostat for a a controller to cloud test. Our findings show that it is possible to practically pull forensically relevant state data from IoT devices. Future work and open research problems are shared.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close IoT device forensics is a difficult problem given that manufactured IoT devices are not standardized, many store little to no historical data, and are always connected; making them extremely volatile. The goal of this paper was to address these challenges by presenting a primary account for a general framework and practical approach we term Forensic State Acquisition from Internet of Things (FSAIoT). We argue that by leveraging the acquisition of the state of IoT devices (e.g. if an IoT lock is open or locked), it becomes possible to paint a clear picture of events that have occurred. To this end, FSAIoT consists of a centralized Forensic State Acquisition Controller (FSAC) employed in three state collection modes: controller to IoT device, controller to cloud, and controller to controller. We present a proof of concept implementation using openHAB – a device agnostic open source IoT device controller – and self-created scripts, to resemble a FSAC implementation. Our proof of concept employed an Insteon IP Camera as a controller to device test, an Insteon Hub as a controller to controller test, and a nest thermostat for a a controller to cloud test. Our findings show that it is possible to practically pull forensically relevant state data from IoT devices. Future work and open research problems are shared. Close http://doi.acm.org/10.1145/3098954.3104053 doi:10.1145/3098954.3104053 Close
2015
	Gupta, Vikas; Breitinger, Frank How Cuckoo Filter Can Improve Existing Approximate Matching Techniques (Proceedings Article) In: James, Joshua I.; Breitinger, Frank (Ed.): Digital Forensics and Cyber Crime, pp. 39-52, Springer International Publishing, 2015, ISBN: 978-3-319-25511-8, (bf Best Paper Award). (Abstract \| Links \| BibTeX) @inproceedings{GB15, title = {How Cuckoo Filter Can Improve Existing Approximate Matching Techniques}, author = {Vikas Gupta and Frank Breitinger}, editor = {Joshua I. James and Frank Breitinger}, url = {http://dx.doi.org/10.1007/978-3-319-25512-5_4}, doi = {10.1007/978-3-319-25512-5_4}, isbn = {978-3-319-25511-8}, year = {2015}, date = {2015-12-25}, booktitle = {Digital Forensics and Cyber Crime}, volume = {157}, pages = {39-52}, publisher = {Springer International Publishing}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter. In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint.}, note = {bf Best Paper Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close In recent years, approximate matching algorithms have become an important component in digital forensic research and have been adopted in some other working areas as well. Currently there are several approaches but especially sdhash and mrsh-v2 attract the attention of the community because of their good overall performance (runtime, compression and detection rates). Although both approaches have a quite different proceeding, their final output (the similarity digest) is very similar as both utilize Bloom filters. This data structure was presented in 1970 and thus has been around for a while. Recently, a new data structure was proposed and claimed to be faster and have a smaller memory footprint than Bloom filter – Cuckoo filter. In this paper we analyze the feasibility of Cuckoo filter for approximate matching algorithms and present a prototype implementation called mrsh-cf which is based on a special version of mrsh-v2 called mrsh-net. We demonstrate that by using Cuckoo filter there is a runtime improvement of approximately 37% and also a significantly better false positive rate. The memory footprint of mrsh-cf is 8 times smaller than mrsh-net, while the compression rate is twice than Bloom filter based fingerprint. Close http://dx.doi.org/10.1007/978-3-319-25512-5_4 doi:10.1007/978-3-319-25512-5_4 Close
	Baggili, Ibrahim; Oduru, Jeff; Anthony, Kyle; Breitinger, Frank; McGee, Glenn Watch What You Wear: Preliminary Forensic Analysis of Smart Watches (Proceedings Article) In: Availability, Reliability and Security (ARES), 2015 10th International Conference on, pp. 303-311, 2015. (Abstract \| Links \| BibTeX) @inproceedings{BOA15, title = {Watch What You Wear: Preliminary Forensic Analysis of Smart Watches}, author = {Ibrahim Baggili and Jeff Oduru and Kyle Anthony and Frank Breitinger and Glenn McGee}, doi = {10.1109/ARES.2015.39}, year = {2015}, date = {2015-08-27}, booktitle = {Availability, Reliability and Security (ARES), 2015 10th International Conference on}, pages = {303-311}, abstract = {This work presents preliminary forensic analysis of two popular smart watches, the Samsung Gear 2 Neo and LG G. These wearable computing devices have the form factor of watches and sync with smart phones to display notifications, track footsteps and record voice messages. We posit that as smart watches are adopted by more users, the potential for them becoming a haven for digital evidence will increase thus providing utility for this preliminary work. In our work, we examined the forensic artifacts that are left on a Samsung Galaxy S4 Active phone that was used to sync with the Samsung Gear 2 Neo watch and the LG G watch. We further outline a methodology for physically acquiring data from the watches after gaining root access to them. Our results show that we can recover a swath of digital evidence directly form the watches when compared to the data on the phone that is synced with the watches. Furthermore, to root the LG G watch, the watch has to be reset to its factory settings which is alarming because the process may delete data of forensic relevance. Although this method is forensically intrusive, it may be used for acquiring data from already rooted LG watches. It is our observation that the data at the core of the functionality of at least the two tested smart watches, messages, health and fitness data, e-mails, contacts, events and notifications are accessible directly from the acquired images of the watches, which affirms our claim that the forensic value of evidence from smart watches is worthy of further study and should be investigated both at a high level and with greater specificity and granularity.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close This work presents preliminary forensic analysis of two popular smart watches, the Samsung Gear 2 Neo and LG G. These wearable computing devices have the form factor of watches and sync with smart phones to display notifications, track footsteps and record voice messages. We posit that as smart watches are adopted by more users, the potential for them becoming a haven for digital evidence will increase thus providing utility for this preliminary work. In our work, we examined the forensic artifacts that are left on a Samsung Galaxy S4 Active phone that was used to sync with the Samsung Gear 2 Neo watch and the LG G watch. We further outline a methodology for physically acquiring data from the watches after gaining root access to them. Our results show that we can recover a swath of digital evidence directly form the watches when compared to the data on the phone that is synced with the watches. Furthermore, to root the LG G watch, the watch has to be reset to its factory settings which is alarming because the process may delete data of forensic relevance. Although this method is forensically intrusive, it may be used for acquiring data from already rooted LG watches. It is our observation that the data at the core of the functionality of at least the two tested smart watches, messages, health and fitness data, e-mails, contacts, events and notifications are accessible directly from the acquired images of the watches, which affirms our claim that the forensic value of evidence from smart watches is worthy of further study and should be investigated both at a high level and with greater specificity and granularity. Close doi:10.1109/ARES.2015.39 Close
	Rathgeb, Christian; Breitinger, Frank; Baier, Harald; Busch, Christoph Towards Bloom filter-based indexing of iris biometric data (Proceedings Article) In: Biometrics (ICB), 2015 International Conference on, pp. 422–429, IEEE 2015, (bf Siew-Sngiem Best Poster Award). (Abstract \| Links \| BibTeX) @inproceedings{7139105, title = {Towards Bloom filter-based indexing of iris biometric data}, author = {Christian Rathgeb and Frank Breitinger and Harald Baier and Christoph Busch}, doi = {10.1109/ICB.2015.7139105}, year = {2015}, date = {2015-05-22}, booktitle = {Biometrics (ICB), 2015 International Conference on}, pages = {422–429}, organization = {IEEE}, abstract = {Conventional biometric identification systems require exhaustive 1 : N comparisons in order to identify a bio- metric probe, i.e. comparison time frequently dominates the overall computational workload. Biometric database indexing represents a challenging task since biometric data does not exhibit any natural sorting order. In this paper we present a preliminary study on the feasibility of applying Bloom filters for the purpose of iris biometric database indexing. It is shown that, by constructing a binary tree data structure of Bloom filters extracted from binary iris biometric templates (iris-codes), the search space can be reduced to O(log N ). In experiments, which are carried out on a medium-sized database of N = 256 subjects, biometric performance (accuracy) is maintained for different conventional identification systems. Further, perspectives on how to employ the proposed scheme on large-scale databases are given.}, note = {bf Siew-Sngiem Best Poster Award}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Conventional biometric identification systems require exhaustive 1 : N comparisons in order to identify a bio- metric probe, i.e. comparison time frequently dominates the overall computational workload. Biometric database indexing represents a challenging task since biometric data does not exhibit any natural sorting order. In this paper we present a preliminary study on the feasibility of applying Bloom filters for the purpose of iris biometric database indexing. It is shown that, by constructing a binary tree data structure of Bloom filters extracted from binary iris biometric templates (iris-codes), the search space can be reduced to O(log N ). In experiments, which are carried out on a medium-sized database of N = 256 subjects, biometric performance (accuracy) is maintained for different conventional identification systems. Further, perspectives on how to employ the proposed scheme on large-scale databases are given. Close doi:10.1109/ICB.2015.7139105 Close
	Satyendra, Gurjar; Baggili, Ibrahim; Breitinger, Frank; Fischer, Alice An empirical comparison of widely adopted hash functions in digital forensics: does the programming language and operating system make a difference? (Proceedings Article) In: Proceedings of the Conference on Digital Forensics, Security and Law, pp. 57–68, 2015. (Abstract \| Links \| BibTeX) @inproceedings{SBBF15, title = {An empirical comparison of widely adopted hash functions in digital forensics: does the programming language and operating system make a difference?}, author = {Gurjar Satyendra and Ibrahim Baggili and Frank Breitinger and Alice Fischer}, url = {https://commons.erau.edu/adfsl/2015/tuesday/6/}, year = {2015}, date = {2015-05-19}, booktitle = {Proceedings of the Conference on Digital Forensics, Security and Law}, pages = {57–68}, abstract = {Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that need to be processed in cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing functionality. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are widespread in computer sciences and have a wide range of applications such as ensuring integrity in cryptographic protocols, structuring database entries (hash tables) or identifying known files in forensic investigations. Besides their cryptographic requirements, a fundamental property of hash functions is efficient and easy computation which is especially important in digital forensics due to the large amount of data that need to be processed in cases. In this paper, we correlate the runtime efficiency of common hashing algorithms (MD5, SHA-family) and their implementation. Our empirical comparison focuses on C-OpenSSL, Python, Ruby, Java on Windows and Linux and C and WinCrypto API on Windows. The purpose of this paper is to recommend appropriate programming languages and libraries for coding tools that include intensive hashing functionality. In each programming language, we compute the MD5, SHA-1, SHA-256 and SHA-512 digest on datasets from 2 MB to 1 GB. For each language, algorithm and data, we perform multiple runs and compute the average elapsed time. In our experiment, we observed that OpenSSL and languages utilizing OpenSSL (Python and Ruby) perform better across all the hashing algorithms and data sizes on Windows and Linux. However, on Windows, performance of Java (Oracle JDK) and C WinCrypto is comparable to OpenSSL and better for SHA-512. Close https://commons.erau.edu/adfsl/2015/tuesday/6/ Close
	Baggili, Ibrahim; Breitinger, Frank Data Sources for Advancing Cyber Forensics: What the Social World Has to Offer (Proceedings Article) In: AAAI Spring Symposium Series, 2015. (Abstract \| Links \| BibTeX) @inproceedings{BB15, title = {Data Sources for Advancing Cyber Forensics: What the Social World Has to Offer}, author = {Ibrahim Baggili and Frank Breitinger}, url = {http://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10227}, year = {2015}, date = {2015-03-12}, booktitle = {AAAI Spring Symposium Series}, abstract = {Cyber forensics is fairly new as a scientific discipline and deals with the acquisition, authentication and analysis of digital evidence. One of the biggest challenges in this domain has thus far been real data sources that are available for experimentation. Only a few data sources exist at the time writing of this paper. The authors in this paper deliberate how social media data sources may impact future directions in cyber forensics, and describe how these data sources may be used as new digital forensic artifacts in future investigations. The authors also deliberate how the scientific community may leverage publically accessible social media data to advance the state of the art in Cyber Forensics.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Cyber forensics is fairly new as a scientific discipline and deals with the acquisition, authentication and analysis of digital evidence. One of the biggest challenges in this domain has thus far been real data sources that are available for experimentation. Only a few data sources exist at the time writing of this paper. The authors in this paper deliberate how social media data sources may impact future directions in cyber forensics, and describe how these data sources may be used as new digital forensic artifacts in future investigations. The authors also deliberate how the scientific community may leverage publically accessible social media data to advance the state of the art in Cyber Forensics. Close http://aaai.org/ocs/index.php/SSS/SSS15/paper/view/10227 Close
2014
	Breitinger, Frank; Liu, Huajian; Winter, Christian; Baier, Harald; Rybalchenko, Alexey; Steinebach, Martin Towards a Process Model for Hash Functions in Digital Forensics (Proceedings Article) In: Gladyshev, Pavel; Marrington, Andrew; Baggili, Ibrahim (Ed.): Digital Forensics and Cyber Crime, pp. 170-186, Springer International Publishing, 2014, ISBN: 978-3-319-14288-3. (Abstract \| Links \| BibTeX) @inproceedings{BLW14, title = {Towards a Process Model for Hash Functions in Digital Forensics}, author = {Frank Breitinger and Huajian Liu and Christian Winter and Harald Baier and Alexey Rybalchenko and Martin Steinebach}, editor = {Pavel Gladyshev and Andrew Marrington and Ibrahim Baggili}, url = {http://dx.doi.org/10.1007/978-3-319-14289-0_12}, doi = {10.1007/978-3-319-14289-0_12}, isbn = {978-3-319-14288-3}, year = {2014}, date = {2014-12-23}, booktitle = {Digital Forensics and Cyber Crime}, volume = {132}, pages = {170-186}, publisher = {Springer International Publishing}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting). One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device. First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Handling forensic investigations gets more and more difficult as the amount of data one has to analyze is increasing continuously. A common approach for automated file identification are hash functions. The proceeding is quite simple: a tool hashes all files of a seized device and compares them against a database. Depending on the database, this allows to discard non-relevant (whitelisting) or detect suspicious files (blacklisting). One can distinguish three kinds of algorithms: (cryptographic) hash functions, bytewise approximate matching and semantic approximate matching (a.k.a perceptual hashing) where the main difference is the operation level. The latter one operates on the semantic level while both other approaches consider the byte-level. Hence, investigators have three different approaches at hand to analyze a device. First, this paper gives a comprehensive overview of existing approaches for bytewise and semantic approximate matching (for semantic we focus on images functions). Second, we compare implementations and summarize the strengths and weaknesses of all approaches. Third, we show how to integrate these functions based on a sample use case into one existing process model, the computer forensics field triage process model. Close http://dx.doi.org/10.1007/978-3-319-14289-0_12 doi:10.1007/978-3-319-14289-0_12 Close
	Breitinger, Frank; Ziroff, Georg; Lange, Steffen; Baier, Harald Similarity Hashing Based on Levenshtein Distances (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 133-147, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6. (Abstract \| Links \| BibTeX) @inproceedings{BZLB14, title = {Similarity Hashing Based on Levenshtein Distances}, author = {Frank Breitinger and Georg Ziroff and Steffen Lange and Harald Baier}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-662-44952-3_10}, doi = {10.1007/978-3-662-44952-3_10}, isbn = {978-3-662-44951-6}, year = {2014}, date = {2014-01-01}, booktitle = {Advances in Digital Forensics X}, volume = {433}, pages = {133-147}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file. This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as sdhash.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close It is increasingly common in forensic investigations to use automated pre-processing techniques to reduce the massive volumes of data that are encountered. This is typically accomplished by comparing fingerprints (typically cryptographic hashes) of files against existing databases. In addition to finding exact matches of cryptographic hashes, it is necessary to find approximate matches corresponding to similar files, such as different versions of a given file. This paper presents a new stand-alone similarity hashing approach called saHash, which has a modular design and operates in linear time. saHash is almost as fast as SHA-1 and more efficient than other approaches for approximate matching. The similarity hashing algorithm uses four sub-hash functions, each producing its own hash value. The four sub-hashes are concatenated to produce the final hash value. This modularity enables sub-hash functions to be added or removed, e.g., if an exploit for a sub-hash function is discovered. Given the hash values of two byte sequences, saHash returns a lower bound on the number of Levenshtein operations between the two byte sequences as their similarity score. The robustness of saHash is verified by comparing it with other approximate matching approaches such as sdhash. Close http://dx.doi.org/10.1007/978-3-662-44952-3_10 doi:10.1007/978-3-662-44952-3_10 Close
	Breitinger, Frank; Winter, Christian; Yannikos, York; Fink, Tobias; Seefried, Michael Using Approximate Matching to Reduce the Volume of Digital Data (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics X, pp. 149-163, Springer Berlin Heidelberg, 2014, ISBN: 978-3-662-44951-6. (Abstract \| Links \| BibTeX) @inproceedings{BWY14, title = {Using Approximate Matching to Reduce the Volume of Digital Data}, author = {Frank Breitinger and Christian Winter and York Yannikos and Tobias Fink and Michael Seefried}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-662-44952-3_11}, doi = {10.1007/978-3-662-44952-3_11}, isbn = {978-3-662-44951-6}, year = {2014}, date = {2014-01-01}, booktitle = {Advances in Digital Forensics X}, volume = {433}, pages = {149-163}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Digital forensic investigators frequently have to search for relevant files in massive digital corpora – a task often compared to finding a needle in a haystack. To address this challenge, investigators typically apply cryptographic hash functions to identify known files. However, cryptographic hashing only allows the detection of files that exactly match the known file hash values or fingerprints. This paper demonstrates the benefits of using approximate matching to locate relevant files. The experiments described in this paper used three test images of Windows XP, Windows 7 and Ubuntu 12.04 systems to evaluate fingerprint-based comparisons. The results reveal that approximate matching can improve file identification – in one case, increasing the identification rate from 1.82% to 23.76%. Close http://dx.doi.org/10.1007/978-3-662-44952-3_11 doi:10.1007/978-3-662-44952-3_11 Close
2013
	Breitinger, Frank; Baier, Harald Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2 (Proceedings Article) In: Rogers, Marcus; Seigfried-Spellar, KathrynC. (Ed.): Digital Forensics and Cyber Crime, pp. 167-182, Springer Berlin Heidelberg, 2013, ISBN: 978-3-642-39890-2. (Abstract \| Links \| BibTeX) @inproceedings{BB12d, title = {Similarity Preserving Hashing: Eligible Properties and a New Algorithm MRSH-v2}, author = {Frank Breitinger and Harald Baier}, editor = {Marcus Rogers and KathrynC. Seigfried-Spellar}, url = {http://dx.doi.org/10.1007/978-3-642-39891-9_11}, doi = {10.1007/978-3-642-39891-9_11}, isbn = {978-3-642-39890-2}, year = {2013}, date = {2013-11-01}, booktitle = {Digital Forensics and Cyber Crime}, volume = {114}, pages = {167-182}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash func- tions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for similarity preserving hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional hash functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of similarity preserving hashing. Additionally, we extend the algorithm MRSH for similarity preserving hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are a widespread class of functions in computer science and used in several applications, e.g. in computer forensics to identify known files. One basic property of cryptographic hash func- tions is the avalanche effect that causes a significantly different output if an input is changed slightly. As some applications also need to identify similar files (e.g. spam/virus detection) this raised the need for similarity preserving hashing. In recent years, several approaches came up, all with different namings, properties, strengths and weaknesses which is due to a missing definition. Based on the properties and use cases of traditional hash functions this paper discusses a uniform naming and properties which is a first step towards a suitable definition of similarity preserving hashing. Additionally, we extend the algorithm MRSH for similarity preserving hashing to its successor MRSH-v2, which has three specialties. First, it fulfills all our proposed defining properties, second, it outperforms existing approaches especially with respect to run time performance and third it has two detections modes. The regular mode of MRSH-v2 is used to identify similar files whereas the f-mode is optimal for fragment detection, i.e. to identify similar parts of a file. Close http://dx.doi.org/10.1007/978-3-642-39891-9_11 doi:10.1007/978-3-642-39891-9_11 Close
	Rathgeb, Christian; Breitinger, Frank; Busch, Christoph Alignment-free cancelable iris biometric templates based on adaptive bloom filters (Proceedings Article) In: Biometrics (ICB), 2013 International Conference on, pp. 1-8, 2013. (Abstract \| Links \| BibTeX) @inproceedings{RBB13, title = {Alignment-free cancelable iris biometric templates based on adaptive bloom filters}, author = {Christian Rathgeb and Frank Breitinger and Christoph Busch}, url = {http://dx.doi.org/10.1109/ICB.2013.6612976}, doi = {10.1109/ICB.2013.6612976}, year = {2013}, date = {2013-09-30}, booktitle = {Biometrics (ICB), 2013 International Conference on}, pages = {1-8}, abstract = {Biometric characteristics are largely immutable, i.e. unprotected storage of biometric data provokes serious privacy threats, e.g. identity theft, limited re-newability, or cross-matching. In accordance with the ISO/IEC 24745 standard, technologies of cancelable biometrics offer solutions to biometric information protection by obscuring biometric signal in a non-invertible manner, while biometric comparisons are still feasible in the transformed domain. In the presented work alignment-free cancelable iris biometrics based on adaptive Bloom filters are proposed. Bloom filter-based representations of binary biometric templates (iris-codes) enable an efficient alignment-invariant biometric comparison while a successive mapping of parts of a binary biometric template to a Bloom filter represents an irreversible transform. In experiments, which are carried out on the CASIA - v 3 iris database, it is demonstrated that the proposed system maintains biometric performance for diverse iris recognition algorithms, protecting biometric templates at high security levels.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Biometric characteristics are largely immutable, i.e. unprotected storage of biometric data provokes serious privacy threats, e.g. identity theft, limited re-newability, or cross-matching. In accordance with the ISO/IEC 24745 standard, technologies of cancelable biometrics offer solutions to biometric information protection by obscuring biometric signal in a non-invertible manner, while biometric comparisons are still feasible in the transformed domain. In the presented work alignment-free cancelable iris biometrics based on adaptive Bloom filters are proposed. Bloom filter-based representations of binary biometric templates (iris-codes) enable an efficient alignment-invariant biometric comparison while a successive mapping of parts of a binary biometric template to a Bloom filter represents an irreversible transform. In experiments, which are carried out on the CASIA - v 3 iris database, it is demonstrated that the proposed system maintains biometric performance for diverse iris recognition algorithms, protecting biometric templates at high security levels. Close http://dx.doi.org/10.1109/ICB.2013.6612976 doi:10.1109/ICB.2013.6612976 Close
	Breitinger, Frank; Astebøl, Knut; Baier, Harald; Busch, Christoph mvHash-B - A New Approach for Similarity Preserving Hashing (Proceedings Article) In: IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on, pp. 33-44, 2013. (Abstract \| Links \| BibTeX) @inproceedings{BABB13, title = {mvHash-B - A New Approach for Similarity Preserving Hashing}, author = {Frank Breitinger and Knut Astebøl and Harald Baier and Christoph Busch}, url = {http://dx.doi.org/10.1109/IMF.2013.18}, doi = {10.1109/IMF.2013.18}, year = {2013}, date = {2013-07-25}, booktitle = {IT Security Incident Management and IT Forensics (IMF), 2013 Seventh International Conference on}, pages = {33-44}, abstract = {The handling of hundreds of thousands of files is a major challenge in today's IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close The handling of hundreds of thousands of files is a major challenge in today's IT forensic investigations. In order to cope with this information overload, investigators use fingerprints (hash values) to identify known files automatically using blacklists or whitelists. Besides detecting exact duplicates it is helpful to locate similar files by using similarity preserving hashing (SPH), too. We present a new algorithm for similarity preserving hashing. It is based on the idea of majority voting in conjunction with run length encoding to compress the input data and uses Bloom filters to represent the fingerprint. It is therefore called mvHash-B. Our assessment shows that mvHash-B is superior to other SPHs with respect to run time efficiency: It is almost as fast as SHA-1 and thus faster than any other SPH algorithm. Additionally the hash value length is approximately 0.5% of the input length and hence outperforms most existing algorithms. Finally, we show that the robustness of mvHash-B against active manipulation is sufficient for practical purposes. Close http://dx.doi.org/10.1109/IMF.2013.18 doi:10.1109/IMF.2013.18 Close
	Breitinger, Frank; Petrov, Kaloyan Reducing the Time Required for Hashing Operations (Proceedings Article) In: Peterson, Gilbert; Shenoi, Sujeet (Ed.): Advances in Digital Forensics IX, pp. 101-117, Springer Berlin Heidelberg, 2013, ISBN: 978-3-642-41147-2. (Abstract \| Links \| BibTeX) @inproceedings{BK13, title = {Reducing the Time Required for Hashing Operations}, author = {Frank Breitinger and Kaloyan Petrov}, editor = {Gilbert Peterson and Sujeet Shenoi}, url = {http://dx.doi.org/10.1007/978-3-642-41148-9_7}, doi = {10.1007/978-3-642-41148-9_7}, isbn = {978-3-642-41147-2}, year = {2013}, date = {2013-01-01}, booktitle = {Advances in Digital Forensics IX}, volume = {410}, pages = {101-117}, publisher = {Springer Berlin Heidelberg}, series = {IFIP Advances in Information and Communication Technology}, abstract = {Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Due to the increasingly massive amounts of data that need to be analyzed in digital forensic investigations, it is necessary to automatically recognize suspect files and filter out non-relevant files. To achieve this goal, digital forensic practitioners employ hashing algorithms to classify files into known-good, known-bad and unknown files. However, a typical personal computer may store hundreds of thousands of files and the task becomes extremely time-consuming. This paper attempts to address the problem using a framework that speeds up processing by using multiple threads. Unlike a typical multithreading approach, where the hashing algorithm is performed by multiple threads, the proposed framework incorporates a dedicated prefetcher thread that reads files from a device. Experimental results demonstrate a runtime efficiency of nearly 40% over single threading. Close http://dx.doi.org/10.1007/978-3-642-41148-9_7 doi:10.1007/978-3-642-41148-9_7 Close
2012
	Breitinger, Frank; Baier, Harald Performance Issues About Context-Triggered Piecewise Hashing (Proceedings Article) In: Gladyshev, Pavel; Rogers, MarcusK. (Ed.): Digital Forensics and Cyber Crime, pp. 141-155, Springer Berlin Heidelberg, 2012, ISBN: 978-3-642-35514-1. (Abstract \| Links \| BibTeX) @inproceedings{BB12a, title = {Performance Issues About Context-Triggered Piecewise Hashing}, author = {Frank Breitinger and Harald Baier}, editor = {Pavel Gladyshev and MarcusK. Rogers}, url = {http://dx.doi.org/10.1007/978-3-642-35515-8_12}, doi = {10.1007/978-3-642-35515-8_12}, isbn = {978-3-642-35514-1}, year = {2012}, date = {2012-12-01}, booktitle = {Digital Forensics and Cyber Crime}, volume = {88}, pages = {141-155}, publisher = {Springer Berlin Heidelberg}, series = {Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering}, abstract = {A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close A hash function is a well-known method in computer science to map arbitrary large data to bit strings of a fixed short length. This property is used in computer forensics to identify known files on base of their hash value. As of today, in a pre-step process hash values of files are generated and stored in a database; typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. Due to security properties of cryptographic hash functions, they can not be used to identify similar files. Therefore Jesse Kornblum proposed a similarity preserving hash function to identify similar files. This paper discusses the efficiency of Kornblum's approach. We present some enhancements that increase the performance of his algorithm by 55% if applied to a real life scenario. Furthermore, we discuss some characteristics of a sample Windows XP system, which are relevant for the performance of Kornblum's approach. Close http://dx.doi.org/10.1007/978-3-642-35515-8_12 doi:10.1007/978-3-642-35515-8_12 Close
	Breitinger, Frank; Baier, Harald Properties of a similarity preserving hash function and their realization in sdhash (Proceedings Article) In: Information Security for South Africa (ISSA), pp. 1-8, 2012. (Abstract \| Links \| BibTeX) @inproceedings{BB12c, title = {Properties of a similarity preserving hash function and their realization in sdhash}, author = {Frank Breitinger and Harald Baier}, url = {http://dx.doi.org/10.1109/ISSA.2012.6320445}, doi = {10.1109/ISSA.2012.6320445}, year = {2012}, date = {2012-10-04}, booktitle = {Information Security for South Africa (ISSA)}, pages = {1-8}, abstract = {Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Finding similarities between byte sequences is a complex task and necessary in many areas of computer science, e.g., to identify malicious files or spam. Instead of comparing files against each other, one may apply a similarity preserving compression function (hash function) first and do the comparison for the hashes. Although we have different approaches, there is no clear definition / specification or needed properties of such algorithms available. This paper presents four basic properties for similarity pre- serving hash functions that are partly related to the properties of cryptographic hash functions. Compression and ease of computation are borrowed from traditional hash functions and define the hash value length and the performance. As every byte is expected to influence the hash value, we introduce coverage. Similarity score describes the need for a comparison function for hash values. We shortly discuss these properties with respect to three existing approaches and finally have a detailed view on the promising approach sdhash. However, we uncovered some bugs and other peculiarities of the implementation of sdhash. Finally we conclude that sdhash has the potential to be a robust similarity preserving digest algorithm, but there are some points that need to be improved. Close http://dx.doi.org/10.1109/ISSA.2012.6320445 doi:10.1109/ISSA.2012.6320445 Close
	Breitinger, Frank; Baier, Harald; Beckingham, Jesse Security and implementation analysis of the similarity digest sdhash (Proceedings Article) In: First International Baltic Conference on Network Security & Forensics (NeSeFo), 2012. (BibTeX) @inproceedings{BBB12, title = {Security and implementation analysis of the similarity digest sdhash}, author = {Frank Breitinger and Harald Baier and Jesse Beckingham}, year = {2012}, date = {2012-08-01}, booktitle = {First International Baltic Conference on Network Security & Forensics (NeSeFo)}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close
	Breitinger, Frank; Baier, Harald A fuzzy hashing approach based on random sequences and hamming distance (Proceedings Article) In: Proceedings of the Conference on Digital Forensics, Security and Law, pp. 89–100, 2012. (Abstract \| Links \| BibTeX) @inproceedings{BB12b, title = {A fuzzy hashing approach based on random sequences and hamming distance}, author = {Frank Breitinger and Harald Baier}, url = {https://commons.erau.edu/cgi/viewcontent.cgi?article=1193&context=adfsl}, year = {2012}, date = {2012-05-01}, booktitle = {Proceedings of the Conference on Digital Forensics, Security and Law}, pages = {89–100}, abstract = {Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to `rebuild' an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file's bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Hash functions are well-known methods in computer science to map arbitrary large input to bit strings of a fixed length that serve as unique input identifier/fingerprints. A key property of cryptographic hash functions is that even if only one bit of the input is changed the output behaves pseudo randomly and therefore similar files cannot be identified. However, in the area of computer forensics it is also necessary to find similar files (e.g. different versions of a file), wherefore we need a similarity preserving hash function also called fuzzy hash function. In this paper we present a new approach for fuzzy hashing called bbHash. It is based on the idea to `rebuild' an input as good as possible using a fixed set of randomly chosen byte sequences called building blocks of byte length l (e.g. l = 128). The proceeding is as follows: slide through the input byte-by-byte, read out the current input byte sequence of length l, and compute the Hamming distances of all building blocks against the current input byte sequence. Each building block with Hamming distance smaller than a certain threshold contributes the file's bbHash. We discuss (dis-)advantages of our bbHash to further fuzzy hash approaches. A key property of bbHash is that it is the first fuzzy hashing approach based on a comparison to external data structures. Close https://commons.erau.edu/cgi/viewcontent.cgi?article=1193&context=adfsl Close
2011
	Baier, Harald; Breitinger, Frank Security Aspects of Piecewise Hashing in Computer Forensics (Proceedings Article) In: IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on, pp. 21-36, 2011. (Abstract \| Links \| BibTeX) @inproceedings{BB11, title = {Security Aspects of Piecewise Hashing in Computer Forensics}, author = {Harald Baier and Frank Breitinger}, url = {http://dx.doi.org/10.1109/IMF.2011.16}, doi = {10.1109/IMF.2011.16}, year = {2011}, date = {2011-06-17}, booktitle = {IT Security Incident Management and IT Forensics (IMF), 2011 Sixth International Conference on}, pages = {21-36}, abstract = {Although hash functions are a well-known method in computer science to map arbitrary large data to bit strings of a fixed length, their use in computer forensics is currently very limited. As of today, in a pre-step process hash values of files are generated and stored in a database, typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. This approach has several drawbacks, which have been sketched in the community, and some alternative approaches have been proposed. The most popular one is due to Jesse Kornblum, who transferred ideas from spam detection to computer forensics in order to identify similar files. However, his proposal lacks a thorough security analysis. It is therefore one aim of the paper at hand to present some possible attack vectors of an active adversary to bypass Kornblum's approach. Furthermore, we present a pseudo random number generator being both more efficient and more random compared to Kornblum's pseudo random number generator.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Although hash functions are a well-known method in computer science to map arbitrary large data to bit strings of a fixed length, their use in computer forensics is currently very limited. As of today, in a pre-step process hash values of files are generated and stored in a database, typically a cryptographic hash function like MD5 or SHA-1 is used. Later the investigator computes hash values of files, which he finds on a storage medium, and performs look ups in his database. This approach has several drawbacks, which have been sketched in the community, and some alternative approaches have been proposed. The most popular one is due to Jesse Kornblum, who transferred ideas from spam detection to computer forensics in order to identify similar files. However, his proposal lacks a thorough security analysis. It is therefore one aim of the paper at hand to present some possible attack vectors of an active adversary to bypass Kornblum's approach. Furthermore, we present a pseudo random number generator being both more efficient and more random compared to Kornblum's pseudo random number generator. Close http://dx.doi.org/10.1109/IMF.2011.16 doi:10.1109/IMF.2011.16 Close
2010
	Breitinger, Frank; Nickel, Claudia User Survey on Phone Security and Usage (Proceedings Article) In: Brömme, Arslan; Busch, Christoph (Ed.): BIOSIG, pp. 139-144, GI, 2010, ISBN: 978-3-88579-258-1. (Abstract \| Links \| BibTeX) @inproceedings{BN10, title = {User Survey on Phone Security and Usage}, author = {Frank Breitinger and Claudia Nickel}, editor = {Arslan Brömme and Christoph Busch}, url = {http://dblp.uni-trier.de/db/conf/biosig/biosig2010.html#BreitingerN10}, isbn = {978-3-88579-258-1}, year = {2010}, date = {2010-06-01}, booktitle = {BIOSIG}, volume = {164}, pages = {139-144}, publisher = {GI}, series = {LNI}, abstract = {Mobile phones are widely used nowadays and during the last years devel- oped from simple phones to small computers with an increasing number of features. These result in a wide variety of data stored on the devices which could be a high security risk in case of unauthorized access. A comprehensive user survey was con- ducted to get information about what data is really stored on the mobile devices, how it is currently protected and if biometric authentication methods could improve the cur- rent state. This paper states the results from about 550 users of mobile devices. The analysis revealed a very low securtiy level of the devices. This is partly due to a low security awareness of their owners and partly due to the low acceptance of the offered authentication method based on PIN. Further results like the experiences with mobile thefts and the willingness to use biometric authentication methods as alternative to PIN authentication are also stated.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } Close Mobile phones are widely used nowadays and during the last years devel- oped from simple phones to small computers with an increasing number of features. These result in a wide variety of data stored on the devices which could be a high security risk in case of unauthorized access. A comprehensive user survey was con- ducted to get information about what data is really stored on the mobile devices, how it is currently protected and if biometric authentication methods could improve the cur- rent state. This paper states the results from about 550 users of mobile devices. The analysis revealed a very low securtiy level of the devices. This is partly due to a low security awareness of their owners and partly due to the low acceptance of the offered authentication method based on PIN. Further results like the experiences with mobile thefts and the willingness to use biometric authentication methods as alternative to PIN authentication are also stated. Close http://dblp.uni-trier.de/db/conf/biosig/biosig2010.html#BreitingerN10 Close

Frank Breitinger

Chair for Cybersecurity, University of Augsburg (Germany)

2025

2023

2022

2019

2018

2017

2015

2014

2013

2012

2011

2010