Power To The Profile

A CI challenge project built around making data profiling in AI/ML workflows easy full repo located Here.

Data Profiling: Pandas Profiling

Pandas Profiling is a tremendous tool for building extremely comprehensive data profiles for EDA (exploratory data science). Using this package, one can generate robust data profiles.

Data Profiling: AI/ML Initial Insights

One can leverage machine learning to help analyze your dataset. You can run Primary Component Analysis to see how redundant its properties are, use multi dimensional scaling to visualize its structure, run outlier detection to identify anomalous examples, or use clustering to get a sense for whether the data has any inherent categories.

Data Profiling: Pandas Charting

This section demonstrates visualization through charting. For information on visualization of tabular data within pandas this allows for better comprention of results and data sets.

Data Bricks: Large Data Visulaztion and Breakdown

Here we work with Source data: California 2020 Census - California to show how large data sets can be both profiled and visulized with ease on Data Bricks using Spark, Pandas or Kowalas

Data Bricks: ML Flow Tracking Models

MLflow is an open source platform for managing the end-to-end machine learning lifecycle.Tracking: Allows you to track experiments to record and compare parameters and results.Models: Allow you to manage and deploy models from a variety of ML libraries.

Data Bricks: Automated Profiling Spark/Pandas

Databricks is a Unified Analytics Platform on top of Apache Spark that accelerates innovation by unifying data science, engineering and business. With fully managed Spark clusters in the cloud, you can easily provision clusters with just a few clicks.

Data Bricks: Basic Experiment Tracking Automation

Here we use Ml flow in Databricks for tracking a model trained on Diabetes patient records obtained from two sources: an automatic electronic recording device and paper records.

ML Flow: Serving on Cloud Notebooks

This example breaks down how one can use server generators like pyngrok the python lib to build out and run mlflow servers even on cloud notebook platforms like google colab.

ML Flow: Run Ml Flow server in local Notebook

This example shows how to run a ML Flow server direct inside a Local Jupyter Notebook using the ML Flow Server command tools

ML Flow: Basics of using with Data Bricks

If you’re just getting started with Databricks, consider using MLflow on Databricks Community Edition, which provides a simple managed MLflow experience for lightweight experimentation. Remote execution of MLflow projects is not supported on Databricks Community Edition