About

Tarek Aloui

Hey there, welcome to DerekWithT! My real name is Tarek Aloui, and I'm in my senior year studying Computer Science at Harvard. I have a strong passion for machine learning and data science. When I'm not engrossed in algorithms for my classes or research, you can find me crafting mobile and full-stack web applications. On this platform, I aim to demystify tech by sharing valuable insights from my academic journey and step-by-step guides for various projects. My blog and notes section serve as a live archive of my continuous exploration into new technologies. Whether it's a side project or an experimental machine learning model, I'm committed to both learning and sharing knowledge. Feel free to explore my journey, and if you're keen to see an overview of my portfolio or get a glimpse into my past projects, simply scroll down.

Explore My Projects 🚀

Welcome to my portfolio. In this space, you'll find a curated selection of my work, spanning from machine learning applications to web development and financial analysis. Each project represents a unique challenge tackled, skills honed, and solutions engineered.

For an in-depth understanding of each project—including the technology stack, methodologies, and key learnings—I invite you to visit the Projects tab. There, you'll find comprehensive write-ups, code repositories, and live demos. In addition to showcasing my projects, I plan to share my journey by updating the Notes and Blogs pages, where I'll delve into the key concepts and learnings that come my way.

Feel free to Check Out My Projects for a deeper dive into my professional and academic endeavors.

💡

Checkout my latest fullstack ML Project!

Loan Originator

Upload your bank statement and get instantaneous loan decision

https://loanoriginator.derekwitht.com/

GitHub - TarekAloui/LoanOriginator: An LLM-powered full-stack project for extracting transaction data from bank statements and deciding whether or not to provide a loan based on a KNN classifier.

An LLM-powered full-stack project for extracting transaction data from bank statements and deciding whether or not to provide a loan based on a KNN classifier. - GitHub - TarekAloui/LoanOriginator:...

https://github.com/TarekAloui/LoanOriginator/tree/main

Projects Overview

LoanOriginator: Full-Stack ML Solution for Bank Statement Analysis and Loan Decision Automation

Developed LoanOriginator, a full-stack ML project automating transaction data extraction from bank statements and utilizing AI for loan decision-making.

Constructed the frontend using Next.js, Typescript, and TailwindCSS, incorporating server actions and data visualizations through Tremor and Material-UI (MUI). The design supports both light and dark modes for improved user accessibility.

Engineered the backend with FastAPI and Python, integrating Langchain for operating GPT models and Scikit-Learn for the KNN classifier. Google Cloud Firestore was employed for database management, and Google Cloud Storage for secure document handling.

Used Large Language Models, including GPT-4 for transaction parsing and decision rationale, and GPT-3.5 Turbo for metadata extraction from statements.

Containerized the frontend and backend using Docker and utilized Google Cloud Run for a serverless architecture, allowing for autoscaling and continuous deployment

Current backend efforts are directed towards refining AI model accuracy in transaction categorization and reducing parsing errors. Planned UI/UX enhancements include improved user authentication systems and separate user interfaces for customers and agents.

Skills & Technologies Used: API Development, Backend Development, Cloud Storage, Containerization, Continuous Deployment, Data Analysis, Docker, FastAPI, Frontend Development, Full-Stack Development, Google Cloud Firestore, Google Cloud Run, Google Cloud Storage, GPT-3.5 Turbo, GPT-4, K-Nearest Neighbors (KNN) Classifier, Large Language Models (LLMs), Machine Learning, Material-UI (MUI), Next.js, Pandas, Python, Scikit-Learn, Serverless Deployment, TailwindCSS, Transaction Data Processing, Typescript, User Interface Design.

Eureka Papers: Your Guide to Navigating the AI Research Landscape

Developed a web application to track, recommend, and summarize key AI papers, aiming to keep researchers and enthusiasts updated.

Implemented frontend using Next.js, TailwindCSS, and Apollo GraphQL, deployed on Vercel

Built a scalable backend with Django and GraphQL, hosted on Azure, integrated with a PostgreSQL database for robust data management.

Utilized Agile methodologies for project management through Notion, enhancing team efficiency in a small-scale, student-led setting.

Ongoing work includes integration of a recommendation model, automated database updates, and user personalization features.

Skills & Technologies Used: API Development, Agile Methodologies, Azure, Cloud, Django, Generative AI, Git, GraphQL, Machine Learning, Natural Language Processing, Next.js, PostgreSQL, Project Management, Prompt Engineering, Recommendation Systems, SQL, Server-Side Rendering, TailwindCSS, TypeScript, Vercel, Version Control, Web Scraping.

Exploring Machine Learning Methods for Classifying Financial Market Regimes

Conducted literature review to assess state-of-the-art techniques in financial market regime detection, focusing on Gaussian Mixture Models and Wasserstein k-means.

Executed Exploratory Data Analysis (EDA) on S&P500 and various market indicators, applying data preprocessing and feature engineering techniques to optimize data for modeling.

Implemented machine learning algorithms including K-means clustering, Gaussian Mixture Models, and evaluated models using inertia and silhouette metrics.

Enhanced return prediction models by integrating regime classification, validating the practical applicability of machine learning models for investment decision-making.

Presented a rigorous evaluation of models' effectiveness, including their performance trade-offs, setting the groundwork for future research in optimizing regime classification.

Skills & Technologies Used: Data Preprocessing, Data Visualization, EDA, Feature Engineering, Gaussian Mixture Models, Hyperparameter Tuning, K-Means Clustering, Keras, Linear Regression, Logistic Regression, Matplotlib, NumPy, PCA, Pandas, Principal Component Analysis, Python, ROC-AUC, Random Forest Classifier, Ridge Regression, Scikit-learn, Scipy, TensorFlow.

Predicting Mortality Outcomes Using Advanced Machine Learning Techniques: An Ensemble and Survival Analysis Approach

Employed Stacking Ensemble Analysis to integrate multiple machine learning algorithms like Logistic Regression and Random Forest for enhanced predictive accuracy.

Utilized Survival Analysis techniques, such as Cox Proportional Hazards and Random Survival Forests, to model the time-to-event nature of mortality outcomes.

Assessed model performance rigorously using negative mean squared error and the concordance index (c-index) as key metrics.

Conducted comprehensive Data Preprocessing and Feature Engineering to optimize datasets for advanced modeling.

Laid groundwork for future improvements by identifying opportunities for algorithmic fine-tuning and variable inclusion.

Skills & Technologies Used: Cross-Validation, Data Preprocessing, Data Visualization, EDA, Ensemble Methods, Exploratory Data Analysis, Feature Engineering, Hyperparameter Tuning, Matplotlib, Model Evaluation, NumPy, Pandas, Jupyter Notebook.

Read4me: Using Optical Character Recognition and Machine Learning in Assistive Reading Devices

Developed a wearable assistive reading device paired with an Android app to convert printed text into audio, targeting users with vision impairments, dyslexia, and illiteracy.

Utilized Optical Character Recognition via Google Cloud Vision API and Natural Language Processing for real-time text correction and translation.

Integrated hardware components like Raspberry Pi and Arduino for image capture and user interaction, employing Firebase for real-time data synchronization between hardware and app.

Implemented voice command capabilities using Dialogflow API and enabled text-to-speech in the Android app for a seamless user experience.

Received first place at the National Science Fair in Tunisia and a Fourth Grand Award at the Intel International Science and Engineering Fair (ISEF), marking the first such recognition for a Tunisian participant.

Skills & Tools Used: Optical Character Recognition, Natural Language Processing, Android Development, Java, Python, Firebase Database, Google Cloud Vision API, Google Translation API, Dialogflow API, Raspberry Pi, Arduino, HTTP Requests, RESTful APIs, Speech Recognition, Text-to-Speech.

Urban Sound Classification in New York City Using Convolutional Neural Networks and Random Forest

Developed a machine learning pipeline to classify urban sounds in NYC into 10 predefined categories, utilizing Mel spectrograms for effective feature extraction.

Implemented Random Forest Classifier and Convolutional Neural Networks, optimizing performance through hyperparameter tuning via GridSearchCV and RandomizedSearchCV.

Applied k-fold cross-validation to mitigate overfitting and assess model robustness, achieving consistent performance across various validation sets.

Evaluated model performance using metrics like mean balanced per-class accuracy and AUC scores, while acknowledging the limitations of these metrics for holistic evaluation.

Addressed ethical concerns regarding the potential capture of personal conversations in urban sound data, emphasizing the need for responsible model deployment.

Skills & Tools Used: Accuracy Metrics, Cross-Validation, Data Augmentation, Data Visualization, Exploratory Data Analysis, GridSearchCV, Logistic Regression, Matplotlib, NumPy, PyTorch, Python, ROC-AUC, Random Forest, RandomizedSearchCV, Scikit-learn, TQDM.

Chronological Dating of Historical Texts Using Recurrent Neural Networks

Objective: Designed and implemented a hybrid RNN to estimate the historical era of various texts, improving dating accuracy over word-level-only models.

Data Handling: Utilized Wolfram Language's ServiceConnect for seamless data collection from Openlibrary, assembling a diverse dataset across multiple text genres.

Architectural Design: Created a dual-layer RNN combining word-level GloVe models and character-level processing to capture both semantic and nuanced language aspects.

Training and Evaluation: Leveraged GPU-based training with periodic checkpoints, achieving an average dating error margin of 25 years on well-known literary titles.

Challenges: Identified areas for improvement in data distribution and hyperparameter tuning, affecting performance on older documents.

Skills and Technologies Used: Data Collection, Data Preprocessing, GloVe Model, Recurrent Neural Networks, Wolfram Language, Natural Language Processing.

Get in Touch 👋

I'm always open to discussing new projects, creative ideas, or opportunities to add value to your team. Feel free to reach out to me through any of the platforms below:

Email: taloui@college.harvard.edu

LinkedIn: https://www.linkedin.com/in/tarek-aloui/

GitHub: https://github.com/TarekAloui

I look forward to connecting with you.