Harsh Khatri | ML Engineer | Computer Vision

About Me

I am a Machine Learning Engineer with a focus on Computer Vision and Deep Learning. I completed my Master's in Computer Science from Boston University with a perfect 4.0 GPA, where I conducted research on Open-set 3D Object Detection under Prof. Bryan Plummer.

My expertise lies in developing transformer-based algorithms, 3D object detection systems, and facial authentication technologies. With a background in both academic research and industry applications, I bring a unique perspective to solving complex machine learning challenges.

Prior to my Master's, I worked as a Software Developer at Reflexis Systems (acquired by Zebra Technologies), where I led development teams and built enterprise-scale applications. I hold a Bachelor's degree in Metallurgical Engineering and Materials Science from the prestigious Indian Institute of Technology Bombay (IITB).

I'm passionate about advancing the field of AI and am constantly exploring new techniques and methodologies to improve machine learning systems. When not coding or researching, I enjoy volunteering and teaching underprivileged students.

Work Experience

Machine Learning Engineer

Logile - The Logic of Retail

July 2024 - Present

Developed transformer based forecasting algorithm outperforming legacy model, improving forecast accuracy by 8% while reducing codebase by 83%.
Leveraged Nvidia Rapids to design and deploy a high-performance scaling solution, resulting in a 20x efficiency boost and a 60% reduction in processing time of legacy forecasting algorithm.
Enhanced the performance of the legacy algorithm by vectorizing data processing and feature engineering steps.
Optimized and scaled the legacy forecasting pipeline by architecting and deploying a solution on AWS EMR.

Transformers PyTorch Nvidia Rapids AWS EMR

Graduate Research Assistant, IVC Lab

Advised by Prof. Bryan Plummer

Sept 2023 - May 2024

Conducted research for masters thesis on Open-set 3D Object Detection utilizing PointBERT transformer.
Developed a novel Coarse-to-Fine Grained recognition method discerning confusing categories in the dataset.
Designed a custom loss function incorporating offline hard negative mining technique for text and image modalities, resulting in 2% improved performance over state-of-the-art method.

3D Object Detection PointBERT PyTorch Computer Vision

Computer Vision Engineer

Wicket: Facial Authentication

Dec 2023 - Jan 2024

Devised a CNN architecture using ResNet to classify facial features achieving 99.5% accuracy on synthetic dataset and 82.3% accuracy on real dataset.
Developed an architecture employing Stable Diffusion to generate synthetic dataset for facial feature detection.
Utilized negative prompts to improve the quality of the synthetic data by 33% (Prompt engineering).
Implemented a filtering mechanism to discern hallucinations and false negatives by extracting features from pretrained CLIP, DINO, and ConvNext models and training a ensemble model to predict human perception decision boundary.

CNN ResNet Stable Diffusion CLIP DINO ConvNext

Software Developer

Reflexis Systems (Acquired by Zebra Technologies)

Jul 2019 - Aug 2022

Spearheaded a team of 5 developers; identified, prioritized and delegated skill-appropriate tasks using Scrum.
Developed, maintained, and tested 400,000+ lines of code of Enterprise Performance Product, capable of multidimensional analysis across different verticals using Java, ReactJs and Python.
Implemented an efficient test automation pipeline utilizing Katalon (Selenium based), Cypress, Jenkins, JUnit and GCP, leading to over 1000 hours of saved developer time previously spent on manual testing.
Initiated a project to restructure the interactive data visualization dashboard, leading to a performance improvement of 50% using modular code, maximizing API asynchronization, and utilizing hash maps.
Recognized with an "Exceeds Expectations" rating for exceptional leadership demonstrated in spearheading multiple projects, resulting in a 27% increase in team productivity.

Java ReactJS Python Selenium Jenkins GCP

Projects

Sim-to-Real in 3D Object Detection

Advised by Prof. Eshed Ohn-Bar

Sep 2023 - Dec 2023

Achieved 9% performance improvement over baseline detection models by implementing various methods to close the Sim2Real gap in 3D Object Detection.
Conducted evaluation of cutting-edge 3D object detection methodologies, including PointNet, CurveNet, and GDANet, on real-world OmniObject3D point cloud dataset, effectively demonstrating the Sim2Real gap.
Proposed a novel random sampling approach for synthetic point cloud data, resulting in a notable 4% enhancement in performance when evaluated on real dataset.
Enhanced the performance of CurveNet point cloud encoder by integrating language embeddings derived from CLIP, resulting in a significant 3% improvement in accuracy.
Utilized LLM to generate descriptive captions for classes and augment caption embeddings, resulting in a notable 2% enhancement in model performance.

3D Object Detection PointNet CurveNet CLIP LLM

Autonomous Driving (OpenAI Gym)

Advised by Prof. Eshed Ohn-Bar

Sep 2023 - Dec 2023

Designed an autonomous driving model using: Imitation Learning, Modular Pipeline and Reinforcement Learning.
Imitation Learning: Trained a custom CNN for the gym car racing achieving 200+ reward, setting the baseline. Optimized CNN architecture by implementing DAGGER algorithm, achieving a 20% increase in performance.
Modular Pipeline: Engineered a modular self driving pipeline featuring modules—Lane Detection, Waypoint Prediction, Lateral and Longitudinal Control systems (Stanley Controller and PID Controller)—yielding 50% improvement in performance over Imitation Learning.
Reinforcement Learning: Implemented the PPO (Proximal Policy Optimization) Reinforcement Learning algorithm to achieve a substantial 37% performance enhancement over modular approach.
Achieved best score in a class of 23 students for autonomous driving with Modular Pipeline.

OpenAI Gym CNN DAGGER PPO Reinforcement Learning

Phrase Grounding

Advised by Prof. Bryan Plummer

Jan 2023 - May 2023

Compiled and analyzed existing research on phrase grounding, problem of associating natural language phrases or descriptions with corresponding objects in an image.
Identified and experimented with unique model components such as image and text encoder, multi-modal interaction, loss function, sampling strategies, evaluation metric and regularization method.
Conducted extensive ablation study by developing and evaluating sampling techniques like Hard Negative, Weighted Negative and Semi-Hard Negative mining.

Computer Vision NLP Multi-modal Learning PyTorch

Hand Gesture Recognition

Advised by Prof. Bryan Plummer

Sep 2022 - Dec 2022

Improved performance by 7% over state-of-the-art gesture recognition by integrating ResNeXt with hand landmark detector using LSTM based integration architecture.
Conducted an ablation study by training and evaluating vision model architectures like ResNeXt, MobileNet and ShuffleNet for gesture recognition task on Jester dataset (Qualcomm).
Integrated ResNeXt 101 with Google MediaPipe Hand Landmarker which detects 21 unique landmarks of the hand.

ResNeXt LSTM MediaPipe Computer Vision

Publications

Multi-omics data analysis to assess severity in COVID-19 patients

Co-author with research team under Prof. Sanjeeva Srivastava

Conducted exploratory analysis of mass-spectrometry data and identified 10 significant proteins and 6 metabolites responsible for the severity of COVID-19. Generated critical insights by implementing classification models (SVM) and statistical tools (t-test) in Python.

Read Publication

A Multi-omics Longitudinal Study of COVID-19 Patients

Co-author with research team under Prof. Sanjeeva Srivastava

Contributed to a longitudinal study tracking COVID-19 patients over time through multiple biological data types. Implemented data analysis pipelines to identify biomarkers and patterns related to disease progression and recovery.

Read Publication

Get In Touch

Location

Boston, Massachusetts

Email

harshkhatri242@gmail.com

LinkedIn

linkedin.com/in/harsh242

Machine Learning Computer Vision Generative AI

About Me

Work Experience

Machine Learning Engineer

Graduate Research Assistant, IVC Lab

Computer Vision Engineer

Software Developer

Projects

Sim-to-Real in 3D Object Detection

Autonomous Driving (OpenAI Gym)

Phrase Grounding

Hand Gesture Recognition

Publications

Multi-omics data analysis to assess severity in COVID-19 patients

A Multi-omics Longitudinal Study of COVID-19 Patients

Get In Touch

Location

Email

LinkedIn