Data Warehouse For Leading Conversational Messaging Platform
Industry Challenge/ Challenges
The client’s carrier-grade conversational messaging platform offers a single messaging API for more than 30 channels and a comprehensive toolkit for designing conversational experiences driven by AI. Many chatbots are designed and deployed for this purpose and it’s critical to generate meaningful insights from the bot chat sessions. There are millions of rows of data in a chatbot and session IDs need to be assigned to these rows.
- Time
If there is a hiatus of 10 or more seconds between one chat record and the next, then it is considered a different session.
- Session ID
The same session-id must be assigned to each chat row that follows the preceding row by 10 or less than 10 seconds.
- Warehouse
Tableau requires a robust data warehouse to visualize data as the chatbot has a million rows of data and tableau will conduct a sophisticated query.
Problem Statement
With over 9 billion messages sent each month, the client is a market leader in conversational texting with data coming in from various sources like MySQL, logfiles, S3, etc. The client develops numerous chatbots and wants data visualization for the chatbot to derive insights from the bot chat sessions.
Our Solution
- Algoscale created a data warehouse and launched a set of nodes, called an Amazon Redshift cluster. Our experts used Amazon Redshift Serverless to run and scale analytics without having to provision and manage data warehouse clusters.
- To import the bot data from S3 to Redshift, we created an automated data pipeline that runs every 1 min and copies the new s3 files to Redshift. And as the application stores the bot data on S3 on a daily basis, therefore, the command is run every day as well.
- A stored procedure was designed to load the data from S3 to Redshift.
- Further, a python script was created that runs on Kubernetes. The script creates manifest files and bookmarks in s3. Manifest file collected all URLs of S3 files which are to be loaded in redshift and bookmarks files save the current timestamp.
- The script was employed to invoke the procedure which we have created in redshift and then the procedure will run all the queries.
Technology
Tech Stack
Business Impact
- Our solution helped the client in getting useful insights on over 6 billion messages per month. We built the data warehouse utilizing the Kimball data warehouse model using star schema that provides for facts, dimensions tables, and materialized views.
- Regardless of the intricacy of the query or the volume of data, the data warehouse created by Algoscale performs queries over thousands of parallelized nodes to produce quick responses and allows for the simplification of raw data into a form that is easily comprehensible.
- Our solution enables the client to conduct and scale analyses in a matter of seconds without having to set up and maintain data warehouse infrastructure. Insights can now be acquired from data by simply loading and querying data in the data warehouse.
Also Read: Leveraging Video Analytics to Revamp Golf Training and Performance Analysis
Data Warehouse For Leading Conversational Messaging Platform