float16 precision conversion to Base64

With the advent of vector databases and large model based embeddings, with dimensions of 768 and 2048, building large scale indexes for performing ANN and storing these vectors have become expensive operations. There are many methods of reducing the vector’s memory footprint such as quantization or even int8. Two such well used methods are binarization and using half-precision or float16 to store these vectors. The following are simple code snippets that I collected from various sources for conversion between these formats to base64 to ensure lossless transmission over the wire, such as HTTP services.

Binarization

Binarization is a simple method which works well for large dimensional vectors. There are many methods to define the threshold such as mean or median values per dimension etc., The below is an example of storing a binary vector as base64 and back, packed in blocks, where each block consists of 8 bits.

def base64_to_binary_vec(s):
    binary = base64.b64decode(s)
    bits = [bin(byte)[2:].zfill(8) for byte in binary]
    s_bits = ''.join(bits)
    # print(len(s_bits))
    return s_bits


def convert_binary_tob64(s_vec):
    return base64.b64encode(s_vec).decode("utf-8")

def verify_binary_encoding():
    # binary vector - example 1
    sample_cons_str = "D/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A=="
    print(base64_to_binary_vec(sample_cons_str))

    # binary vector - example 2
    test_str = 'vckIkrUOV/sgvGYNBfCLEimBkRMSSGxA2TESPj7ixDZNofUdJVChxmwDCSKV4TG8EYwQUhOWtRGzMjJ6LbLaVe2nCBJn3wN1LIFwA2ikTpP5DrRCBDFdVYxBkuAKARelzQRNE4QTRLm8WKbMLE1AYLgHpIy1bTtB6tGPRvU6adxDSVjDRlA9XNMlsg0NMB5tRKzLiHoUbwz8B+oNzcC/lA8I3CNyY8JD6kT1eN2Vq+Xt4eTm6AZL3/Cs9lYeG4tjjuzK0ioVMyAaStmsp2MchziKUoYShVQ2qH2HgLoRD9kJjUL7AoBzMivoZTi4jaUfVn6HooiDvAfZt8CpHqxQ0A=='
    print(base64_to_binary_vec(test_str))

    # binary vector - example 3 - to reconstruct the vector
    s_vec = []
    for i in range(0, 2048 // (8 * 2)):
        s_vec += [1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1]

    b64_str = convert_binary_tob64(s_vec)
    # print(b64_str)
    assert (b64_str == sample_cons_str)

    s_vec_recreate = base64_to_binary_vec(b64_str)
    # print(len(s_vec_recreate))
    # print(s_vec_recreate)
    s_vec_expected = ''.join(['0' if val else '1' for val in s_vec])
    # print(s_vec_expected)
    assert(s_vec_recreate == s_vec_expected)

Float 16 to Base64 conversion

The below is an example of storing a float 16 vector as base64 and back to the float16 vector without any loss of data.

There are multiple methods for float16 to base64 conversion.

Method 1 - using Numpy buffer

def convert_f16_to_b64_m1(arr):
    a = np.array(arr, np.float16)
    return base64.b64encode(a.tobytes())

def convert_b64_to_f16(emb):
    binary = base64.b64decode(emb)
    print(binary)
    q = np.frombuffer(binary, dtype=np.float16)
    print(q.shape)
    return q

def verify_f16_encoding_m1():
    b64_emb = convert_f16_to_b64_m1([1.2345])
    assert (np.isclose([1.2345], convert_b64_to_f16(b64_emb), atol=1e-2))

Method 2 - using Struct pack

def convert_f16_to_b64_m2(arr):
    packer = struct.Struct("<96e")
    vector_array = np.array(arr, dtype=np.float16).tolist()
    vector_bytes = packer.pack(*vector_array)
    return base64.b64encode(vector_bytes)

def verify_f16_encoding_m2():
    arr = np.random.normal(0, 0.01, 96).astype('float16')
    b64_emb = convert_f16_to_b64_m2(list(arr))
    assert(np.isclose(arr, convert_b64_to_f16(b64_emb), atol=1e-2).all())

Method 3 - using dtype indicator

Based on the method described at arrays.dtypes.html, <f2 is supposed to be faster than struct.

def convert_f16_to_b64_m3(arr):
    # using f2 is faster
    a = np.array(arr, dtype=np.dtype('<f2'))
    return base64.b64encode(a.tobytes())

def verify_f16_encoding_m3():
    arr = np.random.normal(0, 0.01, 96).astype('float16')
    b64_emb = convert_f16_to_b64_m3(list(arr))
    assert(np.isclose(arr, convert_b64_to_f16(b64_emb), atol=1e-5).all())

Conclusion

The same can be achieved using Java/Scala as well.

Citation

To refer to this post, please cite it as:

Float16 precision conversion to Base64 for lossless transmission | Senthilkumar Gopal. 
https://sengopal.github.io/posts/float16-precision-conversion-to-base64.html