WORK EXPERIENCE
​
Data Science Intern
Geisinger Health System
June 2024 - Present
Houston, Texas
​
-
Designed and built an NLP pipeline to classify intracranial hemorrhage (ICH) from over 30k+ radiology reports, leveraging BERT and advancing research into state-of-the-art Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG)
-
Developed a predictive model on AWS SageMaker to analyze electronic health records (EHR) and identify high-risk, unscreened females for breast cancer, driving proactive health interventions.
-
Implemented automation procedures to streamline extraction and integration of CMS Hospital Care Compare files and internal data from various vendors, cutting down manual effort of one full-time employee by ∼ 2 weeks per quarterly cycle.
Data Scientist
RBL Bank
July 2021 - August 2o23
Mumbai, India
-
Automation: Led a team of 4 to create ETL pipelines on Azure for data migration verification, reducing ∼ 2 hours of daily manual work.
-
Customer Segmentation and Engagement: Implemented a clustering model to segment credit card customers based on spending behavior. Through hyperparameter tuning, feature engineering, and evaluation, accomplished 12% increase in customer engagement.
-
Fraud Detection Model Optimization: Revamped a Credit Card Fraud Classification model, reflected in a significant improvement in AUC-ROC score from 0.82 to 0.89. Contributed to the deployment of the optimized model, enhancing fraud detection capabilities.
-
Interactive Data Visualizations: Took the initiative to automate & optimize various SQL and Excel-based reports into interactive and real-time Tableau Dashboards. Thereby reducing preprocessing and query time, resulted in savings ∼ 50 hours/month.
-
Data Modeling: Designed data models and schemas for relational databases, optimizing query performance and storage efficiency.
Machine Learning Intern
Kaashiv Infotech
Apr 2021 - June 2o21
Pune, India
-
Improved sales prediction model with diverse ensemble of time series models, leveraging optimal features and new engineered inputs
-
Captured best inputs from base model features and engineered features like Median of all models and Time of hour
-
Utilized Support Vector Regressor for aggregation and enhanced predictions by using two models for peak & off-peak hours
-
Tested the model on past data and achieved an average of 1% improvement over various accuracy metrics
Subject Matter Expert Intern
Chegg
Nov 2020 - March 2o21
Pune, India
• Tutored high school/UG level students as an independent contractor on the Chegg platform, achieving a 95% satisfaction rate.
• Experience in teaching over 60+ students and conducting 80+ lessons through the platform.
• Taught students SQL, Database Management, Python/C++ Programming, and guided them in solving projects and assignments