Himanshu Kumar

Bangalore, India.

I am interested in Computer vision, ML, and its applications. I have completed a master's in Artificial intelligence at the Indian Institute of Science, Bangalore.
I have been working in the Industry with an overall experience of more than 3.5 years in Machine learning. I worked on developing various ML/DL models in computer vision and NLP. During my M. Tech, I worked on Depth estimation using an event camera.
I am familiar with ML frameworks and tools such as Pandas, Keras, TensorFlow, scikit-learn, and PyTorch, as well as Pyspark and SQL.


Publications

Simple Transformer with Single Leaky Neuron for Event Vision

WACVW 2025   [PDF]  

Implemented a novel architecture that integrates spatial representational capabilities of pre-trained vision backbones with the temporal processing strengths of Spiking Neural Networks, and achieves remarkable accuracy on DVS Gesture dataset.

Segmentation challenge - MaCVi, WACV 2025

MaCVi, WACV 2025   [PDF]

Improved upon KNet model, which didn’t achieved significant results in last year challenge rankings, and achieved a F1 of 80.8. Secured 3rd rank in USV-based Obstacle Segmentation Challenge, at WACV workshop.


Projects

Deep Learning
  • Natural Language Inference
  • The SNLI corpus (version 1.0) is a collection of 570k human-written English sentence pairs manually labeled for balanced classification with the labels entailment, contradiction, and neutral, supporting the task of natural language inference (NLI), also known as recognizing textual entailment (RTE). Implemented a LSTM and Logistic regression based model on SNLI. Also, fine tuned a pre trained small version of BERT model for SNLI dataset.

  • Depth Estimation using Neuromorphic Camera
  • Working on Monocular Depth Estimation for SLAM by processing subsequent non-overlapping windows of events/frames over an interval. Training will be done based on data obtained by Conventional and Event based Vision cameras, using deep learning methods.

  • Sentiment Classification using Tree structured LSTM
  • Using LSTMs in a tree structured manner, performed binary and 5-class sentiment classification on Stanford Sentiment Treebank dataset. Used Glove embeddings for word representation.

  • Text to Image Synthesis using GAN
  • Using a RNN and Deep Convolutional GAN implemented an image synthesis models, which translates sentence text into image pixels. Using GLOVE word embeddings trained the model to generate images of birds and flowers.

Data Structures
  • Persistent Data Structures
  • Implemented persistent data structures as part of Coursework assignment for Data Structure and Algorithms course. Both Partially and fully persistent data structures were implemented in C. Used the application of Persistent Stack for solving a maze.


Courses

  • E0 251
    Data Structures and Algorithms
  • E0 230
    Computational Methods of Optimisation
  • E0 299
    Computational Linear Algebra
  • E1 222
    Stochastic Models and Applications

  • E1 213
    Pattern Recognition and Neural Networks
  • E0 250
    Deep Learning
  • E1 277
    Reinforcement Learning
  • E9 261
    Speech Information Processing

  • E9 253
    Neural Networks and Learning Systems
  • E9 208
    Digital Video: Perception and Algorithms
  • E9 309
    Advanced Deep Learning


Acheivements & Extra-Curricular

  • 107th AIR - GATE 2019 - Computer Science secured 99.89 percentile amongst around 1 lakh students.
  • Reliance Ode2Code Hackathon, Secured 1st place in Genius Unleashed, a coding challenge in NLP from Reliance.