A probabilistic data structure designed to test whether an element is a member of a set. It provides fast, efficient lookups but may yield false positives.
Notes
Bryan Olin TBC
The key distinction is that a Bloom filter does not say ‘no,’ it says ‘maybe.’
Bloom filters are used in various applications such as databases, web browsers, and email clients. They help reduce storage requirements by allowing for probabilistic errors.
TakeAways
- 📌 Essential component of data structures; fast lookups but with error probability.
- 💡 Bloom filters are designed to be space-efficient.
- 🔍 False positive rate can be reduced by adding more hash functions or increasing the size of the filter.
- Once an element is added to the Bloom filter, it cannot be removed.
Process
- Initialization: Create an array of bits, all initially set to 0.
- Hashing and Setting Bits: Apply multiple hash functions to the input data and set corresponding bits in the array.
- Query: For a new element, apply the same hash functions; if all corresponding bits are 1, it’s a potential match.
Step-by-Step Algorithm
- Initialization
- Create a bit array (think of it like a big bucket with lots of slots) that will hold n bits.
- All the bits in this array are initially set to 0.
- Hash Functions
- Choose k independent hash functions. Each item that you want to put into your Bloom filter will be run through each of these hash functions.
- Inserting an Element
- For example, if we’re adding the word “apple”, we’ll apply all k hash functions to this word.
- Let’s say one function gives us index 5, another gives us index 13, and so on for all k functions.
- Setting Bits in Array
- Go through each of the indexes given by the hash functions (e.g., 5, 13).
- For each index, you’ll set the corresponding bit to a 1 in your bit array.
- So after inserting “apple”, there will be k bits set to 1 at positions 5, 13, etc.
- Querying
- To check if an element might be in the filter (like checking if another word, say “banana”, is in):
- Apply all k hash functions to “banana”.
- You’ll get k indexes.
- Check each of these indexes in the bit array. If any one of them has a 0, then you can be sure that “banana” is not in the filter (because if it were, those bits would have been set to 1).
- However, if every single index you look at in the bit array has a 1, then there’s a possibility that “banana” could be in. But remember, this is just a guess; the filter might not actually contain “banana”.
- To check if an element might be in the filter (like checking if another word, say “banana”, is in):
Thoughts
- 💡 Bloom filters provide quick answers but at the cost of possible false positives, hence useful for situations where a small error rate is acceptable. 🚀