A specific type of function that takes input data (often strings or integers) and returns an output often used as indexes in arrays.

Notes

Daniel Lemire TBC

A good hash function should be fast, deterministic, and consistent.

A hash function is critical for many algorithms and data structures like hash tables, Bloom filters, or databases. Its purpose is to transform input into fixed-size values, typically of a specific type.

TakeAways

  • 📌 Hash functions map inputs of arbitrary size to fixed-size outputs.
  • 💡 They are essential for algorithms requiring quick data access and manipulation.
  • 🔍 Examples include MD5, SHA-1, SHA-2, etc.

Simple Hash Function: Division Method

Given an input string M and a prime number P, this method computes the hash value as follows:

  1. Convert the string M into a numerical equivalent (e.g., ASCII values summed together).
  2. Multiply the numerical equivalent by a constant integer factor, say A.
  3. Take the modulo (mod) of the result with the prime number P.

So mathematically:

hash(M) = (A * M) mod P

Example

Let’s use an example where M is the string “HELLO”, A is 33, and P is 101.

  1. Convert each character to its ASCII value and sum them up:

    • H: 72
    • E: 69
    • L: 76
    • L: 76
    • O: 79

    Sum of ASCII values = 72 + 69 + 76 + 76 + 79 = 372.

  2. Multiply the sum by A (33):

    • 372 * 33 = 12336.
  3. Take modulo (mod) with P (101):

    • 12336 mod 101 = 56.

Therefore, the hash value for “HELLO” using this method with these specific parameters is 56.

Why Use Prime Numbers?

Using a prime number as the modulus P helps to minimize the chances of collisions where two different strings might produce the same hash. This is because prime numbers have fewer factors than composite numbers, making it more difficult for two unrelated inputs to result in the same output.

Limitations

While simple and effective, this method can still lead to collisions if not implemented carefully, especially with longer input strings or smaller primes. For real-world applications requiring a high level of hash distribution (like cryptographic purposes), more complex algorithms like MD5, SHA-1, or SHA-256 are typically used.

Process

  1. Input: Take any input data.
  2. Processing: Transform it into a fixed-size value using mathematical operations.
  3. Output: Return the output index for further use.

Thoughts

  • 🔒 Fast: Efficiently maps long strings to shorter outputs.
  • 🛡 Secure: Difficult to reverse-engineer from output back to input, used in cryptography.

References

  1. Hash function - Wikiwand