Semantic Similarity in Data Mesh Environments: A Column Classification Approach
Efficient classification techniques are needed in decentralized environments due to the rapid increase in high-dimensional and semi-structured data in recent years. In this thesis, we investigate the usability of text embeddings for column classification by analyzing semantic distance metrics to detect semantic similarities between different database columns. OpenAI-text-embeddings-3-small model h
