- Mon 20 February 2023
- Machine Learning
- #backend, #services, #python, #ml-code
With the advent of vector databases and large model based embeddings, with dimensions of 768 and 2048, building large scale indexes for performing ANN and storing these vectors have become expensive operations. There are many methods of reducing the vector’s memory footprint such as quantization or even int8. Two such well used methods are binarization and using half-precision or float16 to store these vectors. The following are simple code snippets that I collected from various sources for conversion between these formats to base64 to ensure lossless transmission over the wire, such as HTTP services.
Binarization
Binarization is a simple method which works well for large dimensional vectors. There are many methods to define the threshold such as mean or median values per dimension etc., The below is an example of storing a binary vector as base64 and back, packed in blocks, where each block consists of 8 bits.
def base64_to_binary_vec(s):
= base64.b64decode(s)
binary = [bin(byte)[2:].zfill(8) for byte in binary]
bits = ''.join(bits)
s_bits # print(len(s_bits))
return s_bits
def convert_binary_tob64(s_vec):
return base64.b64encode(s_vec).decode("utf-8")
def verify_binary_encoding():
# binary vector - example 1
= "D/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A/wD/AP8A=="
sample_cons_str print(base64_to_binary_vec(sample_cons_str))
# binary vector - example 2
= 'vckIkrUOV/sgvGYNBfCLEimBkRMSSGxA2TESPj7ixDZNofUdJVChxmwDCSKV4TG8EYwQUhOWtRGzMjJ6LbLaVe2nCBJn3wN1LIFwA2ikTpP5DrRCBDFdVYxBkuAKARelzQRNE4QTRLm8WKbMLE1AYLgHpIy1bTtB6tGPRvU6adxDSVjDRlA9XNMlsg0NMB5tRKzLiHoUbwz8B+oNzcC/lA8I3CNyY8JD6kT1eN2Vq+Xt4eTm6AZL3/Cs9lYeG4tjjuzK0ioVMyAaStmsp2MchziKUoYShVQ2qH2HgLoRD9kJjUL7AoBzMivoZTi4jaUfVn6HooiDvAfZt8CpHqxQ0A=='
test_str print(base64_to_binary_vec(test_str))
# binary vector - example 3 - to reconstruct the vector
= []
s_vec for i in range(0, 2048 // (8 * 2)):
+= [1,1,1,1,0,0,0,0,0,0,0,0,1,1,1,1]
s_vec
= convert_binary_tob64(s_vec)
b64_str # print(b64_str)
assert (b64_str == sample_cons_str)
= base64_to_binary_vec(b64_str)
s_vec_recreate # print(len(s_vec_recreate))
# print(s_vec_recreate)
= ''.join(['0' if val else '1' for val in s_vec])
s_vec_expected # print(s_vec_expected)
assert(s_vec_recreate == s_vec_expected)
Float 16 to Base64 conversion
The below is an example of storing a float 16 vector as base64 and back to the float16 vector without any loss of data.
There are multiple methods for float16 to base64 conversion.
Method 1 - using Numpy buffer
def convert_f16_to_b64_m1(arr):
= np.array(arr, np.float16)
a return base64.b64encode(a.tobytes())
def convert_b64_to_f16(emb):
= base64.b64decode(emb)
binary print(binary)
= np.frombuffer(binary, dtype=np.float16)
q print(q.shape)
return q
def verify_f16_encoding_m1():
= convert_f16_to_b64_m1([1.2345])
b64_emb assert (np.isclose([1.2345], convert_b64_to_f16(b64_emb), atol=1e-2))
Method 2 - using Struct pack
def convert_f16_to_b64_m2(arr):
= struct.Struct("<96e")
packer = np.array(arr, dtype=np.float16).tolist()
vector_array = packer.pack(*vector_array)
vector_bytes return base64.b64encode(vector_bytes)
def verify_f16_encoding_m2():
= np.random.normal(0, 0.01, 96).astype('float16')
arr = convert_f16_to_b64_m2(list(arr))
b64_emb assert(np.isclose(arr, convert_b64_to_f16(b64_emb), atol=1e-2).all())
Method 3 - using dtype indicator
Based on the method described at arrays.dtypes.html,
<f2
is supposed to be faster than struct.
def convert_f16_to_b64_m3(arr):
# using f2 is faster
= np.array(arr, dtype=np.dtype('<f2'))
a return base64.b64encode(a.tobytes())
def verify_f16_encoding_m3():
= np.random.normal(0, 0.01, 96).astype('float16')
arr = convert_f16_to_b64_m3(list(arr))
b64_emb assert(np.isclose(arr, convert_b64_to_f16(b64_emb), atol=1e-5).all())
Conclusion
The same can be achieved using Java/Scala as well.
Citation
To refer to this post, please cite it as:
Float16 precision conversion to Base64 for lossless transmission | Senthilkumar Gopal.
https://sengopal.github.io/posts/float16-precision-conversion-to-base64.html