{"id":756,"date":"2021-08-15T14:18:09","date_gmt":"2021-08-15T14:18:09","guid":{"rendered":"https:\/\/ambeault.net\/?page_id=756"},"modified":"2021-10-08T12:09:16","modified_gmt":"2021-10-08T12:09:16","slug":"491-2-2","status":"publish","type":"page","link":"https:\/\/ambeault.net\/?page_id=756","title":{"rendered":""},"content":{"rendered":"\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<div class=\"wp-block-cover has-background-dim\" style=\"min-height:145px;aspect-ratio:unset;\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1028\" height=\"300\" class=\"wp-block-cover__image-background wp-image-768\" alt=\"\" src=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-case-study-header-banner-1.jpg\" data-object-fit=\"cover\" srcset=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-case-study-header-banner-1.jpg 1028w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-case-study-header-banner-1-300x88.jpg 300w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-case-study-header-banner-1-1024x299.jpg 1024w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-case-study-header-banner-1-768x224.jpg 768w\" sizes=\"(max-width: 1028px) 100vw, 1028px\" \/><div class=\"wp-block-cover__inner-container is-layout-flow wp-block-cover-is-layout-flow\">\n<p class=\"has-text-align-left\" style=\"font-size:25px\"><meta charset=\"utf-8\"><strong>Data-driven operations<\/strong><\/p>\n<\/div><\/div>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Objective<\/strong>s<\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:75%\">\n<p class=\"has-black-color has-text-color\"><meta charset=\"utf-8\"><meta charset=\"utf-8\">Consolidate data capture to improve integrity and security.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Reduce the engineering level of effort to make data available for analytics, marketing, operations, and product feature use cases.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Team Leads<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\">Product: <a href=\"https:\/\/www.linkedin.com\/in\/brad-sherman-4813775\/\" target=\"_blank\" rel=\"noreferrer noopener\">Brad Sherman<\/a><\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\">Engineering: <a href=\"https:\/\/www.linkedin.com\/in\/daynesh\/\" target=\"_blank\" rel=\"noreferrer noopener\">Daynesh Mangal<\/a><\/p>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Insights<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\">Engineers were the most sophisticated users of data. They required low latency, unstructured data for monitoring, alarming, and analysis. Some teams also required data to power product features, like recommendations and A\/B testing.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">The analytics team had requirements for regular reporting to commercial stakeholders, which needed to be created by analysts using GUI-based tools. The data scientists required access to lightly structured data to develop more sophisticated dashboards and ad hoc analyses for commercial stakeholders.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Product managers required structured consumer behavior and operational data to run ad hoc queries to gain insights into the performance of features.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Ad sales and marketing users required first-party data enriched with third-party data to create demographic and behavioral segments to target campaigns at specific consumers using GUI-based tools.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\">Data requirements had been historically met by integrating over 20 vendor SDKs into client applications with a growing backlog of additional SDKs.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">SDK&#8217;s were not available for all device platforms, creating data gaps.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">There was no methodology to monitor and alarm data captured by SDKs.&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">On average, an SDK took 6 resource weeks per device platform to develop, test and release.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">The product line consisted of 18 content brands across 9 device platforms, requiring the release of 162 binaries to implement or update an SDK.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Client app crashes were being caused by 3rd party SDKs.<\/p>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Hypothesis<\/strong><\/h3>\n\n\n\n<p class=\"has-black-color has-text-color\">If we define a single data dictionary that encompasses all use cases, captures the events once in the platform, transforms them for multiple applications, and delivers events server to server, we will have better, cheaper, and safer data.<\/p>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Architecture<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<p class=\"has-black-color has-text-color\"><strong>Event Stream<\/strong> was developed to capture data, form the events into topic and sub-topic streams and transform the data streams for subscriber systems. It consisted of 6 components: Events SDK, Ingress, Data Pipeline, Syncs, Post Processing Pipeline, and Storage Repository.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Ingress interfaces received event streams from clients and APIs. All events were published to the Data Pipeline and written to a Repository. Syncs used platform APIs to decorate data from the pipeline, then transform it to meet each subscriber\u2019s specifications. The subscribers posted events back to the pipeline for other subscribers and storage. The Post Processing Pipeline confirmed the successful receipt of data by subscriber systems. &nbsp;<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"859\" height=\"579\" src=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-specification.jpg\" alt=\"\" class=\"wp-image-770\" srcset=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-specification.jpg 859w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-specification-300x202.jpg 300w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-specification-768x518.jpg 768w\" sizes=\"(max-width: 859px) 100vw, 859px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Software Design<\/strong><\/h3>\n\n\n\n<p class=\"has-black-color has-text-color\">The new data capture and transform capabilities were delivered by a New York City platform engineering team for the US digital business in September 2018.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:50%\">\n<p class=\"has-black-color has-text-color\">The software design had 7 entities in a single sequence to enable all required use cases.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Kinesis Firehose ingested the events and wrote the events to an S3 bucket and IoT MQTT simultaneously. Based on the event type the data was added to topics and sub-topics streams in the Data Pipeline.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Syncs were developed to enrich data from Data Pipeline topics with platform API data to meet subscriber system requirements. Syncs retried sending data using exponential backoff if the subscriber failed to respond or errored. The Post Processing Pipeline continuously queried all the subscribers, gathering data to verify that each of those systems made the data available in compliance with specifications. Errors were communicated to DevOps staff in near real-time.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">A critical component of the solution design was the data dictionary for capture. The first-party data dictionary defined 65 metrics with a total of 103 dimensions (i.e. dims) across 13 schemas. Each Sync employed a source to target schema that mapped the data dictionary to each subscriber system\u2019s specification.<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column has-black-color has-text-color is-layout-flow wp-block-column-is-layout-flow\">\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"635\" height=\"271\" src=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-frontend-spec.jpg\" alt=\"\" class=\"wp-image-774\" srcset=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-frontend-spec.jpg 635w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/data-pipeline-frontend-spec-300x128.jpg 300w\" sizes=\"(max-width: 635px) 100vw, 635px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"944\" height=\"502\" src=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-16-at-8.53.02-AM.png\" alt=\"\" class=\"wp-image-775\" srcset=\"https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-16-at-8.53.02-AM.png 944w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-16-at-8.53.02-AM-300x160.png 300w, https:\/\/ambeault.net\/wp-content\/uploads\/2021\/08\/Screen-Shot-2021-08-16-at-8.53.02-AM-768x408.png 768w\" sizes=\"(max-width: 944px) 100vw, 944px\" \/><\/figure>\n<\/div>\n<\/div>\n\n\n\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Technologies<\/strong><\/h3>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\">Time to market was reduced by using native AWS functions, like Kinesis, IoT, SQS, and S3.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">Proprietary business and orchestration logic was developed with Golang to run in Lambdas.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">InfluxDB with Kapacitor subscribed to the Data Pipeline to power Slack Alerts, PagerDuty, and Grafana dashboards for DevOps (e.g., ad beacons debugging and monitoring).<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">SQS-based syncs created event streams for feature APIs (e.g., Playhead markers API for cross-device viewing).&nbsp;<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<p class=\"has-black-color has-text-color\">SQS-based Syncs supported Google Analytics, Adobe Analytics, Adobe Audience Manager, FreeWheel, Kochava marketing attribution, Braze and BlueShift CRM, Vidora content recommendations, comScore, and A\/B testing.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">A proxy captured events between publishers and Ingress API, writing them to MySQL to automate testing. Anomalies against the data schema were visualized using Tableau.<\/p>\n<\/div>\n<\/div>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Retrospective<\/strong><\/h3>\n\n\n\n<p class=\"has-black-color has-text-color\">\u25e6 SQS fees will be the most costly portion of the solution. Replacing this messaging queue technology with an open source option will need to be road mapped to reduce AWS costs as traffic scales.<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\">\u25e6 Defining the data dictionary, source to target mappings, and user acceptance testing were each a higher level of effort than the actual coding. Optimizing these activities will speed time to market. &nbsp;<\/p>\n<\/div>\n\n\n\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\">\n<h3 class=\"has-black-color has-text-color wp-block-heading\"><meta charset=\"utf-8\"><strong><span style=\"color:#0d49b1\" class=\"has-inline-color\">|<\/span><\/strong> <strong>Key Results<\/strong><\/h3>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>Fast Integrations<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><meta charset=\"utf-8\">\u25e6 Data integrations reduced from 54 resource weeks to 2<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>Increased Data Integrity<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><meta charset=\"utf-8\">\u25e6 Post Processing Pipeline verified less than 1% data loss in QA and production<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>Happier Customers<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><meta charset=\"utf-8\">\u25e6 22% reduction in app crashes by removing SDK\u2019s<\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><strong>Team Recognition<\/strong><\/p>\n\n\n\n<p class=\"has-black-color has-text-color\"><meta charset=\"utf-8\">\u25e6 Invited to AWS Reinvent and Adobe Summit in 2019 to present this technology<\/p>\n<\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>| Objectives Consolidate data capture to improve integrity and security. Reduce the engineering level of effort to make data available for analytics, marketing, operations, and product feature use cases. | Team Leads Product: Brad Sherman Engineering: Daynesh Mangal | Insights Engineers were the most sophisticated users of data. They required low latency, unstructured data for monitoring, alarming, and analysis. Some [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"class_list":["post-756","page","type-page","status-publish","hentry"],"aioseo_notices":[],"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/ambeault.net\/index.php?rest_route=\/wp\/v2\/pages\/756","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ambeault.net\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/ambeault.net\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/ambeault.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ambeault.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=756"}],"version-history":[{"count":5,"href":"https:\/\/ambeault.net\/index.php?rest_route=\/wp\/v2\/pages\/756\/revisions"}],"predecessor-version":[{"id":1057,"href":"https:\/\/ambeault.net\/index.php?rest_route=\/wp\/v2\/pages\/756\/revisions\/1057"}],"wp:attachment":[{"href":"https:\/\/ambeault.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=756"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}