Personal Project

Similar Image Recommender

TensorFlowPythonKeras

Overview

A content-based image retrieval system that uses deep learning feature extraction to find and rank visually similar images from a reference dataset. The engine explores two approaches: using a pre-trained VGG16 model and a custom Convolutional Autoencoder. Given a query image, it returns the most visually similar images ranked by Euclidean distance in the embedding space.

Role

Sole Researcher & Engineer

Problem

Traditional image search relies on metadata tags and filenames — not the visual content itself. The goal was to build a system that understands what an image looks like and can surface similar images based on visual features alone, without any manual labelling.

Solution

Implemented two feature extraction methods. The first uses a pre-trained VGG16 CNN as a feature extractor, removing the classification head to expose the 'fc1' embedding layer. The second trains a custom Convolutional Autoencoder to compress images into a 16-dimensional latent space. Images are encoded into feature vectors and similarity is computed using Euclidean distance. A nearest-neighbour search over the embedding index retrieves and ranks the most similar images.

Architecture

A TensorFlow/Keras feature extraction pipeline using both a headless pre-trained VGG16 model and a custom Autoencoder, a vector index of pre-computed embeddings, and Euclidean distance ranking for query-time retrieval.

Key Design Decisions

Pre-trained VGG16 CNN backbone with the classification head removed to expose 4096-dim feature vectors from the 'fc1' layer.
Custom Convolutional Autoencoder trained from scratch to compress images into a dense 16-dim latent space representation.
Batch pre-computation of embeddings for the reference image dataset stored as a NumPy index/pickle file.
Euclidean distance computed between query embedding and all reference vectors for ranked retrieval.
TensorFlow image preprocessing pipeline matching the training-time normalisation.
Top-K retrieval with distance scores returned alongside matched images.
Evaluation includes comparing both Euclidean distance of visual features and text similarity of product categories.

Challenges

Choosing the right layer depth for feature extraction — too shallow loses semantic meaning, too deep overfits to ImageNet classes.
Designing and training an effective Convolutional Autoencoder architecture to capture meaningful visual representations in a highly compressed latent space.
Normalising embeddings consistently between pre-computation and query time to avoid distance distortion.
Scaling the similarity search efficiently as the reference dataset grows.
Evaluating retrieval quality without labelled ground-truth pairs, mitigated by checking category similarity.

Impact

Demonstrated content-based image retrieval without any manual labelling or metadata.
Compared performance between transfer learning (VGG16) and custom representation learning (Autoencoder).
Achieved visually coherent similarity rankings across diverse image categories.
Open-sourced as an educational reference for embedding-based retrieval systems.

View on GitHub ↗

← PreviousImage Generation Model Next →Image Description Generator