SHARE

Views: 203

   ||   

Lead Generation ML Pipeline

   ||   

In this project we use Azure ML to serve a machine learning API, that analyzes scraped lead generation data, returning ranked information on top leads and other key indicators. We use Jupyter notebook to interact with the API, and to further analyze the results, as well as providing visualizations.

Initially, we setup and test the model by using a dataset fetched from Kaggle. We then move onto scraping potential lead data from websites, forming our own dataset, and finally we perform the analysis on this dataset.

We focus on B2B leads, and to scrape the the lead generation data we use somewhat unconventional source in that we scrape the lead data from job postings. In order to stay GDPR compliant we won’t scrape any personal data, instead we just focus on getting information on companies that fir our target requirements, based on the information we gather from the job postings. 

                       

                                         Jupyter         Python        Azure ML

The following guide provides step-by-step instructions for creating a studies management software in Excel with the input data visualized in Power BI. The presented data is exported from Excel and the visualizations in PowerBI will update as changes to cell data are made in Excel. 

Some key features include:

◍ The ability to manage studies in a very granular and in-depth manner
◍ The ability to modify the platform to accommodate a wide variety of needs and situations 
◍ Powerful visualization and insight capabilities
◍ Interactive and easily understandable user interface

Below is an example of the finished studies manager when loaded into PowerBI.

Try the different functions by interacting with the elements on the canvas. By pressing this icon you can go into full-screen mode. Switch between the different pages of the report here

Press ESC to return to this page.

Guide for Creating Project

TABLE OF CONTENTS

PHASE[1]
Initial Setup and Data Collection

The first step in building our Lead Generation ML Pipeline is to set up the necessary environment and gather an initial dataset to test our workflow. This phase lays the foundation for the entire project, ensuring we have the right tools, infrastructure, and data to build a robust machine learning model.

We begin by setting up Azure ML, which will serve as the core platform for training and deploying our model. Additionally, we install essential tools such as Jupyter Notebook, Azure ML SDK, and Kaggle API to facilitate data exploration and model experimentation.

Once the environment is configured, we define our project goals and ranking criteria – this helps us identify what constitutes a high-quality lead based on relevant business indicators. To test our pipeline early on, we fetch a sample dataset from Kaggle, allowing us to experiment with different preprocessing techniques before moving on to real-world lead data.

With our dataset in place, we load and explore the data, performing an initial assessment of its structure, quality, and potential challenges. This step helps us understand how the data is distributed and prepares us for the data cleaning and preprocessing phase that follows.

By the end of this phase, we have a well-configured ML environment and a structured dataset ready for deeper processing and analysis.

 


Set up the Azure ML environment

Step 1: Create an Azure ML Workspace

Before diving into data processing and model training, we need to set up Azure Machine Learning (Azure ML) — a cloud-based platform that provides a scalable and efficient environment for building, training, and deploying machine learning models. This step ensures that our ML pipeline is well-structured and can be seamlessly integrated into an API for real-world lead generation.

  1. Sign in to the Azure Portal.
  2. Navigate to Azure Machine Learning and click Create a new workspace.
  3. Provide the required details:
    • Subscription: Choose your Azure subscription.
    • Resource Group: Create a new resource group or use an existing one.
    • Workspace Name: Assign a unique name to your workspace.
    • Region: Select the nearest available region for optimized performance.
  4. Click Review + Create and wait for the workspace to be deployed.

Once the deployment is complete, you can access the Azure ML Studio, a web interface for managing ML models, datasets, and experiments.

Step 2: Set Up the Local Development Environment

For being able to work locally on our project, we set up a virtual environment to connect, manage dependencies and install necessary packages.

a) Create venv and install packages

  • Create and activate virtual environment
				
					python -m venv lead-gen-env
source lead-gen-env/bin/activate  # On macOS/Linux
lead-gen-env\Scripts\activate     # On Windows

				
			
  • Install required packages
				
					pip install --upgrade pip
pip install azureml-sdk[notebooks,automl]  # Azure ML SDK with notebook support
pip install jupyter ipykernel pandas numpy scikit-learn matplotlib seaborn kaggle

				
			
  • Verify the installation
				
					import azureml.core
print("Azure ML SDK version:", azureml.core.VERSION)

				
			

If a version number appears, your installation was successful!

b) Authenticate and Connect to Azure ML

To interact with Azure ML, we need to authenticate using a configuration file.

Retrieve Subscription Details

  1. In Azure Portal, search for Subscriptions.
  2. Copy your Subscription ID.
  3. Note your Resource Group and Workspace Name.

Then create a config.json file in your project directory:

				
					{
    "subscription_id": "your-subscription-id",
    "resource_group": "your-resource-group",
    "workspace_name": "lead-generation-ml"
}

				
			

c) Authenticate using Python

Run the following Python script to connect to Azure ML:

				
					from azureml.core import Workspace
import json

# Load credentials from config.json
with open('config.json') as f:
    config = json.load(f)

# Connect to Azure ML Workspace
ws = Workspace(
    subscription_id=config['subscription_id'],
    resource_group=config['resource_group'],
    workspace_name=config['workspace_name']
)

# Save configuration
ws.write_config(path='config')
print(f"Workspace '{ws.name}' ready for use!")

				
			

This will save credentials locally so you don’t have to re-authenticate in future sessions.

Step 4: Set Up Compute Instance

In the following we show how to set up the compute either in Azure ML cloud or locally using python, and the SDK we previously installed.

a) Set up Compute in Azure ML

Azure ML provides cloud-based compute resources to train and deploy models. To create a compute instance:

  1. In Azure ML Studio, navigate to Compute > Compute Instances.
  2. Click Create, choose a virtual machine type, and specify its size based on your needs.
  3. Once the compute instance is created, it can be used to run Jupyter notebooks and execute training jobs.

b) Set up Compute Locally

For our lead generation pipeline, we’ll need to configure appropriate compute resources:

  1. Development Compute: Set up a compute instance for Jupyter notebook development
  2. Training Compute: Configure a compute cluster for model training
  3. Inference Compute: Prepare resources for model deployment and serving
				
					from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Configure compute cluster
compute_config = AmlCompute.provisioning_configuration(
    vm_size='STANDARD_DS3_v2',
    min_nodes=0,
    max_nodes=4,
    idle_seconds_before_scaledown=1800
)

# Create the cluster
compute_target = ComputeTarget.create(
    ws, 
    "training-cluster",
    compute_config
)

compute_target.wait_for_completion(show_output=True)
				
			

c) Verify Setup

Before proceeding, it’s important to verify that all components are working correctly.

				
					# Test workspace connection
print(ws.name, "workspace loaded")

# List available compute targets
for compute_name in ws.compute_targets:
    compute = ws.compute_targets[compute_name]
    print(compute_name, ":", compute.type, ":", compute.provisioning_state)
				
			

Step 5: Install Dependencies in Azure ML Compute Instance

If you prefer coding directly inside Azure ML Studio, install dependencies in a Jupyter Notebook or terminal.

a) Jupyter Notebook

  1. Go to Azure ML Studio.

  2. In the left sidebar, click on Notebooks.

  3. Click on Create → Notebook and choose Python 3 as the kernel.

  4. In the first cell, run the following command (precede with ! to run shell commands inside a notebook):

				
					!pip install --upgrade azureml-sdk pandas numpy scikit-learn matplotlib

				
			

b) Azure ML Terminal

  1. In Azure ML Studio, go to the Compute section (left sidebar).

  2. Click on the Compute Instances tab.

  3. Find your running instance and click the three-dot menu (•••) next to it.

  4. Select Open terminal.

  5. In the terminal window, run:

				
					!pip install --upgrade azureml-sdk pandas numpy scikit-learn matplotlib

				
			

c) Verify Installation

After installation, verify that Azure ML SDK is correctly set up. Open a Python environment (Jupyter Notebook or terminal) and run:

				
					import azureml.core
print("Azure ML SDK version:", azureml.core.VERSION)

				
			

If the version number appears, your environment is set up successfully!

With these components in place, your Azure ML environment is ready for developing the lead generation pipeline. The workspace will provide version control for your experiments, easy access to compute resources, and a centralized location for managing your machine learning assets..


 

Install required tools (Jupyter, Azure ML SDK, Kaggle API, etc.)

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Define project goals and lead ranking criteria

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Fetch an initial Kaggle dataset to test the pipeline

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


Load and explore the dataset

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

 

Sequence Analysis Notebook

This notebook contains detailed analysis of sequence patterns in our lead generation pipeline.


Open In Colab

Click the badge above to view and run the notebook in Google Colab

Below is an example of the finished studies manager when loaded into PowerBI. Try the different functions by interacting with the elements on the canvas. By pressing this icon you can go into full-screen mode. Switch between the different pages of the report here

Press ESC to return to this page.

Setup Guide:

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

……….

The tutorial has the following main sections:

 Introducing the project management sheets in Excel

  Performing the export/import

………..

Under development / More coming soon

Sequence Analysis with Python

The following assignments introduce applications of hashing with dict() primitive of Python. While doing so, a rudimentary introduction to biological sequences is given.

This framework is then enhanced with probabilities, leading to routines to generate random sequences under some constraints, including a general concept of Markov-chains. All these components illustrate the usage of dict(), but at the same time introduce some other computational routines to efficiently deal with probabilities.

The function collections.defaultdict can be useful.

Below are some "suggested" imports. Feel free to use and modify these, or not. Generally it's good practice to keep most or all imports in one place. Typically very close to the start of notebooks.

PHASE[2]
Data Cleaning & Preprocessing

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Handle missing values, duplicates, and outliers

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Feature engineering: selecting relevant columns for ranking leads

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Normalize, scale, or encode categorical data

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Save the cleaned dataset for model training

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

PHASE[3]
Model Selection & Training

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Choose a suitable ML model

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Split dataset into training & testing sets

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Train the model and evaluate its performance

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Fine-tune hyperparameters for better accuracy

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Save the trained model for deployment

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

PHASE[4]
Deploying the ML Model on Azure

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Convert the trained model into an API

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Deploy the API on Azure Machine Learning Studio

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Test the API using Postman / Python requests / Jupyter

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Document API endpoints for later integration

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


PHASE[5]
Web Scraping for Real-World Lead Data

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Identify job posting sources (LinkedIn, Indeed, company websites)

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Use BeautifulSoup, Scrapy, Selenium to scrape job postings

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Extract company details without violating GDPR

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Store scraped data in a structured format (CSV, JSON, or database)

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

PHASE[6]
Analyzing Scraped Lead Data

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Preprocess scraped data for analysis

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Run the ML model on new lead data

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Run the ML model on new lead data

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

PHASE[7]
Visualization & Reporting

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

PHASE[8]
Automating the Pipeline

This guide provides step-by-step instructions for creating a PowerBI studies management visualization (viz). The viz will automatically update itself if the data inside any of the excel columns are changed.

Here is also a Youtube video that goes through the same step-by-step guide that is presented in written form here. 

Under development / More coming soon


 

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments