Welcome to My Portfolio

Josh Capistrano

Passionate about building scalable data solutions and transforming complex business requirements into elegant technical implementations. Experienced in AWS, Python, Spark, and modern data engineering practices.

Explore My Work

About Me

💼

Professional Experience

6+ years of experience in data engineering and software development at leading companies like Bloomberg and Fannie Mae.

🚀

Technical Expertise

Specialized in AWS services, Python, Spark, and building scalable ETL pipelines that process massive datasets efficiently.

🎯

Problem Solver

Proven track record of translating complex business requirements into technical solutions that drive real value.

0+

Years Experience

0M+

Records Processed Daily

0+

Certifications

0%

Error Reduction Achieved

Technical Skills

Core Technologies

Python (Pandas, Boto3, PySpark) 95%

AWS (Lambda, S3, EMR, Redshift) 90%

Apache Spark 85%

Tools & Methodologies

SQL & Database Management 85%

Git & Version Control 90%

Agile Development 80%

Professional Experience

Click on any role to expand details

May 2023 - Present

Data Engineer

Bloomberg L.P., New York, NY

▶ View details

Designed and implemented Spark-based ETL pipelines to efficiently deliver data to clients on Bloomberg Terminal, ensuring timely and accurate data delivery

Developed and maintained microservices similar to AWS Lambda integral to ETL pipelines, facilitating preprocessing tasks and enhancing data quality

Actively participated as a representative of the technology team in the department's knowledge council, contributing to the creation of comprehensive documentation on the tech stack used in projects

Collaborated closely with team representatives to understand and address business needs, ensuring that technical solutions aligned with client requirements and expectations

Played a key role in bridging the gap between technology and business by translating business requirements into technical specifications and delivering solutions that meet client needs effectively

Designed and implemented post-processing microservices tailored to align with specific business requirements, ensuring data quality and integrity before delivery to clients

Collaborated with stakeholders to understand business objectives and translated them into technical requirements for post-processing microservices, ensuring alignment with client needs

January 2020 - May 2023

Software Engineer

Fannie Mae, Reston, VA

▶ View details

Designed and developed scalable AWS infrastructure to store and process large data sets using Pyspark, S3, EMR, and Redshift

Implemented ETL pipelines using Amazon Web Services and Python to extract data from various sources, perform data transformations, and load data into the data warehouse

Utilized AWS EC2 instances to create and manage EMR clusters for data ingestion, ensuring optimal performance and resource utilization

Implemented Spark jobs on AWS EMR to process large datasets, improving data ingestion efficiency and reducing processing time

Developed custom scripts to automate data ingestion, data quality checks, and data reconciliation processes

Utilized AWS Glue for scheduling and running ETL jobs, reducing manual intervention and increasing reliability

Collaborated with cross-functional teams to gather requirements, design solutions, and implement end-to-end data pipelines

Monitored performance and optimized AWS resources usage to reduce costs and increase efficiency

Automated deployment of Lambda functions using AWS CloudFormation

Designed, developed, and deployed custom software applications using languages such as Python, and Java

Featured Projects

Real-Time Data Pipeline

Built a scalable ETL pipeline processing 10M+ records daily using AWS EMR, Lambda, and Redshift with automated data quality checks.

Python PySpark AWS Redshift

View Details GitHub

Data Quality Framework

Developed a comprehensive data validation framework that reduced data errors by 85% using custom Python scripts and automated monitoring.

Python Pandas CloudWatch SNS

View Details GitHub

Microservices Architecture

Designed and implemented Lambda-based microservices for data preprocessing, improving processing speed by 60% and reducing costs by 40%.

AWS Lambda API Gateway S3 CloudFormation

View Details GitHub

Certifications

AWS Certified Solutions Architect

Amazon Web Services

2023

AWS Certified Developer

Amazon Web Services

2022

Apache Spark Certification

Databricks

2022

Python for Data Science

DataCamp

2021

Education

Rutgers University, School of Engineering

Bachelor of Science in Electrical and Computer Engineering

New Brunswick, NJ | December 2019

Concentration: Computer Engineering

Honors: Magna Cum Laude

Get In Touch

Let's Connect!

I'm always interested in hearing about new opportunities, collaborations, or just chatting about data engineering and technology.

Feel free to reach out through the form or connect with me on social media.

Welcome to My Portfolio

About Me

Professional Experience

Technical Expertise

Problem Solver

Technical Skills

Core Technologies

Tools & Methodologies

Professional Experience

Data Engineer

Software Engineer

Featured Projects

Real-Time Data Pipeline

Data Quality Framework

Microservices Architecture

Certifications

AWS Certified Solutions Architect

AWS Certified Developer

Apache Spark Certification

Python for Data Science

Education

Rutgers University, School of Engineering

Get In Touch

Let's Connect!

Resume Preview