Considerations for Building ML-Integrated Systems

Approaching the use of Machine Learning in production - 5-12-2021

From my experiences, developing a model for production should be no different than your approach to building other software, and should fit within your typical Software Development Life Cycle (SDLC). Define your general requirements, leave room for the development team to take ownership on smaller details, and assure to set the right vision for the project's outcome. Beyond the typical software requirements, you'll need to consider a few extra, but mission-critical functional requirements that will guide the ML deployment strategy that your team designs and builds. These requirements are considerations you must understand before proceeding, as they substantially impact the final outcome and success of the project:

  1. Define Stakeholders: Who are the stakeholders? Who will be affected by the development and maintenance patterns?
  2. Model Type: Are you trying to classify something, or forecast a value? Is the response binary, continuous, or discrete? How will the inference be used?
  3. Data Throughput (Volume): How much data are we processing per day, week, or month in GB or records?
  4. Data Churn: Whats the data's velocity? How often does it change? Consider you got a table of 100 records today and the records are dropped periodically with new records then being inserted to take their place.. How long would it take to have an entirely different set of 100 records?
  5. Legal Restrictions: Are there any instances where our prediction could be functionally violating a law, or failing to meet a proper ethical standard of all stakeholders?
  6. Feature Engineering: Are there any specific feature engineering pipelines that must be constructed to correctly prepare the records for inference?
  7. Model Decay: This is largely affected by the Data Churn defined above. How quickly does the predictive quality of the model decay? What frequency must we update the model? How will we measure this?
  8. Temporal Patterns: How frequently is the model invoked by the "customer(s)", whether internal or not?
  9. Configurations / Statefulness: Are there custom configurations that alter the state of the system? (ie a customer database table to retrieve info from), or is the data sent ready for inference?
  10. SLA's: What are the Service Level Agreements? (how fast does it need to return data)

These 10 topics are important to take into account, but this list is certainly not exhaustive. Also note that the given situation can dictate your circumstances and the relative value of each of the above topics when looking at the decisions to be made regarding how you are going to build and what you are going to build. Beyond this initial assessment of your project's needs, its important to consider the following topics while planning and executing on a data science project:

  • Organizationally & Culturally Approaching ML / Data Science
  • Deciding on Model Type
  • Deployment Strategies
  • Making A Decision - A useful decision table

Organizationally & Culturally Approaching ML / Data Science

There are a ton of important factors that will contribute to the success of data science projects these days. A few important ones that I've come to observe involve how your team defines itself, and how the team approaches developement practices.

Development Practices

Before Deciding on building a model, you must assure everyone is on the same page regarding scale and DevOps practices. I'd recommend building models on cloud-hosted cluster or workhorse server, where the data scientists can freely spin machines/resources up and down to perform their work using machines that can handle the sort of cpu-bound calculations they are responsible to perform. Data Scientists are expensive. Perhaps the most expensive piece of the whole dang puzzle. Don't waste their time, because if you do, you're wasting your budget.

Another key is assuring the team fully understands any constraints placed on them by other systems. Perhaps its the applications that serve the Web Requests for the models, or some other team-internal software constraint. It's important to understand these ahead of time, so you don't waste north of $250 training a model using the wrong version of Python, or maybe even just the wrong version of sklearn... To ensure the team's development practices are cohesive, docker environments are often managed by engineers to define the OS and software version requirements used by the data scientists - to free them up to do more data science stuff.

Team Structure

This one seems relatively straightforward, but differs by organization and by the perspectives molded by the histories of the employees at the company. There is never going to be a one-size-fits-all answer here. Its important to define expected outcomes and responsibilities for each contributor, as well as the leadership style being applied to the group in order to attempt to best answer the question of how a team should be structured. Its important to know that there is a difference between Business Intelligence and Data Science. My personal perception of the divide can be defined by looking at the 4 Types of Analytics. The 4 types are: Descriptive, Diagnostic, Predictive, and Perscriptive Analytics.

Business Intelligence entials reporting trends, answering business questions, and analyzing data to understand performance (or lack thereof). This is encompassed by Descriptive, Diagnostic, and sometimes Predictive Analytics. Data Science builds on that same foundation, but the primary focus will lie in the realm of Predictive and Perscriptive Analytics. Its important to understand the fundamental shift in focus between a group focused on Business Intelligence and a group focused on Data Science, because their objectives of these two teams are related, but ultimately not the same.

  • Data Science must be able to maximize predictive value and minimize loss in production systems.
  • Business Intelligence must define perfomance, discover trends, and enhance human decision-making.

Data Science and Business Intelligence both require data engineering skill, similar data storage, and similar supporting infrastructure. They both require skill and intelligent people; albeit, those skills needed for success differ and it is important to admit that. Deploying an API is different than building a dashboard. Training a model is different than defining performance targets. In the situations requiring scale, Data Science becomes much more difficult with respect to technical knowledge, due to the need for deployment strategies, building robust distributed systems, and managing cpu-bound systems. In situations requiring heightened awareness, Business Intelligence becomes much more difficult with respect to product or company knowledge, due to the need for answering questions managers have when trying to make an optimal strategic decision.

It is because of this, that I would suggest 2 general structures:

  1. Small Team: Autonomy is king. Assure a shared vision, but allow decisions to be made by all team members. Regularly checking in and reviewing the goals/vision is crucial to wrangle rogue operators. Understand that one team member may not be fully capable of leading the other team members.
  2. Large Team: Split the Data Analytics group into 3 distinct teams. Business Intelligence, Data Engineering, & Data Science. Each reports to an expert in that arena whose expertise encompasses the entirety of that needed by the others in the group. This enables a more natural push and pull among externally-focused product leaders and internally-focused engineering leaders.
  3. Large Team: Set up a distinct matrix organization structure, where roles report to an engineering leader, as well as a product or business leader. This enforces focus on a particular product, while still enabling engineering managers to establish boundaries for their team's responsibilities and provide mentorship and growth.

Deciding on Model Type

Generally speaking, there are a only few main approaches to deploying a trained model or predictive analytics system, while there is a much higher cardinality of options when it comes to the options for feature engineering and model types to train and test. The data and the expertise of the data scientist should dictate these options. An important thing to mention, especially if you've not done this before, is that simpler is always better. Yes, Deep Learning is ridiculously awesome, and yes, you can essentially have a neural network in a box using one import from TensorFlow... BUT, consider this anecdote:

Imagine you have a xgboost model and a TensorFlow CNN model with the following stats:

InformationTensorFlowxgboost
Training Cost$5,000$300
Accuracy93%85%
Lifespan Required5 yrs5 yrs
Estimated Margin From Accuracy~10%~6%

Consider the following:

  • If it takes an unreasonable amount of machines in a cluster, and runs horribly inefficiently, then you're margin just shrank 2-4% over that 5 year time-span.
  • Another issue might be maintenance, complicated Neural Nets often have absurd dependencies and build requirements. Subtract another 1% from the original margin, along with some hair off your Data Engineer's head as he fiddles with your security-hardened docker images.
  • The model's intricacy causes more frequent re-training scenarios than the simpler xgboost model, with each re-training costing you $500. Perhaps the rate is 1:2, so subtract another 1%

Right there, your model has the potential to cause a lot more effort for the same 6% marginal benefit compared with not having a model at all. Obviously this is a made-up example, but an important one for less experienced people to fully understand. The choices you make today will effect your maintenance and hiring prospects a year from now and that's often overlooked by new teams. Complexity is not your friend!

Deployment Strategies

Distributed systems are cool. Really cool. They are also a total pain in the you-know-where. Unless your volume dictates it, stay the heck away from it. Well, unless you're just trying it out to learn it - in that case, by all means I encourage you to shoot your shot. Having a mentor helps, so I encourage you to either reach out to an established open source community like Dask, or to possibly try to find someone within your network and willing to take you under their wing.

Start smaller, and establish a baseline to research and understand what you might need in the future to scale up or down. Once you feel that your servers no longer suffice your needs, then its time to move up - and that's pretty obvious stuff. The hard part is having the discipline to do so, which is why I mention that.

The most important part is that you feel comfortable with the tech stack you are working with. Too many unknowns and you'll end up like me on my first project: alone and afraid. Also, you'll need to assure you can get the right organizational support, because its unlikely you'll be able to build, improve, and maintain an entire system on your own... at least not a fully updated one with tight security and constant peer reviews...

Depending on your system requirements, you can deduce the technical approach you need to take. This should be a primary decision proceeding any development on the system, and should be documented and assigned a project owner with the power to veto product managers and the like; due to the constant need to guard against scope creep from any business leaders that don't really understand how or why a small requirement tweak can have such large impacts on the approach. Only in dire and justified situations characterized by great change should scope change to alter the trajectory of the project.

I will define the general deployment approaches below, discussing each briefly moving from most gnarly to least gnarly based on my experience, (or lack thereof):

Event Stream or Pub/Sub

Report Card:

  • Scalability Score: 9
  • Speed: 9
  • Ease of Deployment: 2
  • MLOps Pipeline: 2

This type of deployment is pretty intricate, and would ideally service predictive models for internal web applications on for extremely high throughput systems. Examples would be a website or online service which uses Google Firebase and Google Analytics to capture defined web events which are funneled directly to a model that alters the functionality or design of web pages the user interacts with, constantly re-training the model to continuously optimize on new users' experiences and outcomes. Ideally, the team looking to use this system should also be competent using event-driven activity. They should also have access to low-latency and highly available networking setups. If the data churn is off the charts, and your team has specific needs where an event bus makes sense, then by all means give deployment via event stream a good look. Examples of source systems and scenarios where this might apply would be:

  • You are building out IoT systems and want to predict the lifespan based on the syslogs in each device.
  • Reviewing microservice networking logs for Cyber Security, trying to detect anomalies.
  • You're making one of those neat drone shows that involve thousands of drones "dancing" in the sky.
  • You're using Google Firebase for you're heavily trafficked site/game and want to perform ML predictions on fly for some defined Analytics Events.

If you have the available tools and expertise, its a great choice!

Batch

Report Card:

  • Scalability Score: 7-10 (depends on your operational cluster/machine)
  • Speed: 2
  • Ease of Deployment: 8
  • MLOps Pipeline: 8

Batch refers to what most people think about when they think about run-of-the-mill ETL. You get a periodic file input or query lots of records on some cadence, and you need to do preform some sort of large-scale inference on them. In most vanilla ETL/ELT cases, you're just changing data types or altering structures in the most efficient manner possible. However, with a predictive model thrown into the mix, there are some added complications. Predictive models are often cpu-bound, and thus have limited volume capabilities for the

Below are some notes on options you have with this method.

Infrastructure Options:

Options:

  • Small workload? (0-2 GB) Pandas.
  • Medium workload? (2-15 GB) Pandas, but you'll probably need a bigger server than whatever you got right now.
  • Large workload (15-25 GB?) Pandas, if the server's RAM is large enough for the whole data set plus a "vig". Think about Dask or PySpark.
  • XL-Workload (25-750 GB)? Dask or PySpark/Spark running on docker-swarm or something like an AWS ECS cluster. You need a battle ship or two, but not more than 10.
  • Massive workload? Dask or PySpark/Spark running on a Kubernetes Cluster. Its time to call the Navy, because your establishing a beachhead and you need the Big Guns.

Direct Application Integration

Report Card:

  • Scalability Score: 1
  • Speed: 8
  • Ease of Deployment: 9
  • MLOps Pipeline: 5

By Direct Application Integration, I mean directly embedding the trained model in the Application code itself, or within the Application's Database. Examples of this include using PMML to define a simple regression model, and ope rationalizing that within a RDBMS database. In other cases, its also possible to build a small model in Python and embed that into the server-side code or even into a database trigger, given the database supports python run times. An example of a database that might be able to support that is Postgres, but sadly I have not tried that myself because I haven't quite found myself in a situation where this was the best available option.

From what I've read, you don't really want to attempt it with a whole lot of throughout or volume in your system. I'd only do this in smaller, out-of-the-way systems with low value to the company. Certainly not something to build a core competency on.

REST API

Report Card:

  • Scalability Score: 4
  • Speed: 8
  • Ease of Deployment: 9
  • MLOps Pipeline: 8

The bread and the butter.

You can be creative with the approaches here, because you probably have the most options at your disposal. A popular AWS approach is to deploy the model on an entirely serverless infrastructure, utilizing AWS API Gateway + AWS Lambda to operate the trained model on incoming requests. Another super common approach is to build an API using something like Shiny(R), Flask(Python), and more recently FastAPI (Python). These can be deployed into AWS ECS clusters, or deployed on any number of server-based application deployment approaches, but you might be slightly limited on MLOps pipeline opportunities to handle model decay if your app is super custom.

Making A Decision

Thinking back to the initial requirements I listed out at the top of the article, I wanted to provide some sort of structural way you can think of the aforementioned deployment strategies. This list is by no means exhaustive, but it is useful for garnering a certain degree of understanding for different ways which are more viable than others to solve the specific problems you face. Here is a pretty simple decision table to use while facing down these sorts of problems:

What is the data relative volume?What is the data churn?Real-Time / millisecond SLA?Whats the "customer" system?Is there Statefulness / Configurations?Deployment Strategy
MassiveConstantlyYesInternal & High throughputDependsPub/Sub Event Stream
Moderate to HighAnyYesExternal to the infrastructureYes or NoWeb Service (REST API)
AnyNever changesAnyAnyAnyPre-Process. Use API / pass data to source system
Low to ModerateTimed cadenceMaybeInternal in private networksNoServerless Functions (ie AWS Lambda)
Moderate to HighTimed cadenceNoInternal/External File-BasedYes or NoBatch
LowLowMaybeYesDatabase Integration

From Analyst to Engineer

Making the Leap - 5-1-2021

At the moment I am writing this, I would consider making the leap from analyst to engineer one of the greatest milestones of my young career. I wanted to take some time and reflect on this experience so its possible that another young professional may learn from my mistakes on their journey to achieve their goals as well. I've broken this blog up into a few pieces:

  • Where I Was, & How I Got Where I Am Today
  • Coping With Failure
  • Closing Remarks

Where I Was, & How I Got Where I Am Today

If I were to be entirely honest, it was difficult for me to find a job after college. I had a knack for acing the first round interview, only to expose my goofy self in the latter interviews. I knew I would be successful in any position I got, but convincing others that you're better than all other candidates was something that proved difficult for me to accomplish. I would get myself in great position only for my social anxiety to flare up and I'd fumble the ball in the redzone. After graduating without anything lined up, I decided to move back home and continue to pound job sites looking for anything that might offer some potential. Eventually, I found myself deep in the interview process for two separate positions. I drove the 2-hour trip to Columbus, OH from Cleveland, OH on July 3rd, 2018 for an interview. It was over 100 degrees, and there were heat exhaustion warnings for the region, and I just happened to be driving down in my beat-up 2001 Dodge Dakota, which didn't have air conditioning. Hilariously, I actually drove down in dress shoes, an undershirt, and shorts knowing I'd have to change in a parking lot once I got there so I wouldn't show up covered in a soggy, sweat-soaked suit. I met the folks there, and I was pleasantly surprised with how laid back they all were.

I got a call about a week later, informing me that they were extending an offer. The only thing stopping me from celebrating that elusive offer, was the company had awful - and I mean despicable - reviews on glassdoor.com, had a fairly poor location, and the pay was south of average. Having been booked for final-round interviews at two companies based in Las Vegas, NV I was sort of torn whether I should accept or try to wait it out and see if I could get something better with the clock ticking down. I turned to my father, who is my greatest role model in life. I asked him, "What I should do - accept the offer and deal with any possible regret of what almost was?", to which he responded, "The general rule is you shouldn't take points off the board", which is some logic that can't be argued with, and so I accepted the position that week. I started the next month, on August 6th, 2018.

I found myself in an interesting role. I liked to think of myself as an innovator, but I was in a pretty restricted spot at the start of it all, or at least that's how it felt to me. I picked up creating dashboards pretty quick with the help of my coworker and soon to be friend, Anurag. Soon, It seemed that I could get my job done in about 2-4 hours a day, leaving the rest of the time to really hammer our data warehouse with ridiculous queries for experimental ideas I had to revolutionize how our company looked at attributing payments to outbound communications. I also became a wizard with SQL, because I strictly stuck with writing my logic using SQL. This was because our company had a love affair with SAS products, but I seemed to have an instantaneous, and growing resentment of Base-SAS. Having loathed using SAS from day one, I definitely made my opinions known which certainly resulted in some eye-rolling along the way. Because I couldn't install my own software, I would often nag about using Python or R instead, since I needed a managers approval, but that was proving a lost cause. I can vividly recall a time that a week after my company sent another coworker to a SAS conference in Dallas for free, I spent $450.00 of my own money to go to PyCon US, because it was hosted in Cleveland, OH and I knew I could crash at my parents place! (to be fair: the company got 1 free pass, and it was his turn, but the company was too cheap to send multiple people there is the issue I have - talk about supporting personal growth!). I wasn't making much at the time so that expense was fairly significant, and I made the most of it. I went to every possible meeting and even went by the career fair to see what kind of opportunities I could aim for in a 3-5 year span. I was genuinely terrified of meeting new people, but forced myself to go to every single booth and ask how they use python. It was quite the experience - one I wouldn't trade for anything! I really knew I liked data analytics and wanted to stay in that space, but the question was "How on Earth do I convince someone I'm a Data Scientist, or even a Data Engineer?", to which the answer is typically, "more school". But I'm not typical.

I found myself at the right place at the right time, and had the chance to really take a swing at an opportunity that many dream of - to lead the development of a production system to operate a trained predictive model.To be honest again, I'd use "lead" loosely, because it was really just me frantically trying to put the model our statistician trained into a production system. A lone wolf as they call it. I had been promoted from Reporting Analyst to Analytics Data Engineer on my team, and soon thereafter the team embarked on something none of us knew anything about. From my perspective, the predictive model was the easy part, because every single person on the team had trained a predictive model at some point in their career, on top of having a PhD Statistician on our team to confront the mathematical complications in that arena. Given our situation and our staff, I'll have to admit that I thought the odds of success were much less than that of failure. Once I found out that I had to build an API that could adequately handle the load, matching that currently sent to a very competent vender, I just about threw the white flag up right then and there. It was a pretty hilarious situation looking back at it...

I'd compare it to being thrown out of a plane with nothing but a parachute and the clothes on my back into the Amazon Rain Forest and told to make my way out in less than 3 months. Without a senior engineer, or anyone technical in a senior position above me for that matter, it was tough because I didn't get a lot of guidence. I'll have to admit that my manager was a little bold to encourage me, nonetheless trust me to get the job done. It was an incredible risk, but one that ultimately worked out for the both of us - otherwise there wouldn't be anything to write about! Without having anyone within my little corner of the organization to go to, I tried my best to meet with senior IT employees elsewhere around the organization. They were able to meet periodically and help point me in the right direction from time-to-time, but unfortunately they too had stressful deadlines. I was also a bit of a unicorn, and some of the things I was doing was new to a lot of them as well. This left me out of luck with trying to garner others' expertise, and forced me to turn to the tools I had left in the box: my curiosity, tenacity, and Google.com. I had to become super efficient at reading documentation, trial-and-error, finding my own resources, and failing fast. I had to really get used to digging holes and learning when to put the shovel down. The hardest part was certainly doing that before the hole got too deep - which it did on a few occasions. Pride had to be left at the door, because its about were you finish, not where you start.

Brick-by-brick, I was able to lay a foundation. After hours upon hours of extra work and overtime, I fealt like I was on my way. Looking back, I would definitely say there was a tipping point where things just seemed easier, like I had a grip on what I was doing for once. First, I figured out configurations. Next, the API routing and services. Finally, the database connections and the predictive models. After ~3 long months I had found myself in a unique club, but it wasn't without cost or difficulties along the way. I faced that sink or swim moment, and admittingly, I felt like I sometimes had a 15lb rock weighing me down as waves crashed me every which way. Somehow, I stayed above the water and accomplished my dream.

Where am I now? Well, I'm on a team with many of the same people, but now we're different. We're far more capable. Anurag, who originally trained me with the reporting tools and database structure has also moved to an engineering role to help translate statistical correctness into operational systems. Some new faces have been added, and now I've come to the newest challenge, working as a team. Figuring out who does what, how things get done, and of course living with the impending sense of doom from the never-ending pile of technical debt accumulating while I beg business people to gut the product road map. Learning how to build systems with others involved has been great - far less stressful and much more specialized. Like a stone mason and a carpenter build a house, there are things people like to do, and do very well. I've learned that listening to one another and valuing everyone's input is key to success. So is writing clean code that others can readily follow along on and maintain. Its a great follow-up to the earlier trials because it is preparing me for the future. One time while I was an intern at Vizion Solutions (now Vizion360), working on Power BI as an intern-consultant I was taken onsite and the CTO of the company randomly stopped me and told me, "Always remember this: People. Process, Technology. P-P-T.", and I sometimes think back to that. Its important to remember that its about the people you work with, the people you serve, and generally all the stakeholders affected. People matter the most, followed by the process you're trying to fit. If the tech can't adjust to those then you're using the wrong tech. That can be a guiding principle of teamwork, and one that I believe can help us have continued success.

One moment stuck out to me during this whole phase though, and that was the moment when I realized, I'm getting paid to code. That was a pretty cool moment for me, having done so much extracurricular work to convince myself and others that I could get the job done.

Coping With Failure

There were a lot of times I didn't get something right the first time around. As I write this, I'd chalk up the project as a whole as a success, but there were certainly times things didn't go as planned. In retrospect, I'd say the perserverence can be attributed to staying calm in the face of adversity and incredible stress. I failed on so many things that its hard to recount a particular time, but a few things pop out at me today:

  • It took me about 2-4 weeks to fully understand how configurations & secrets worked.
  • I'm guessing it took me around 150-250 docker builds to finally "get it". Not really sure, but it was a lot.
  • I already re-built the entire API.

During the project, I had so much pressure on me that I had developed some sort of anxiety-driven issue where I started to struggle with breathing (and I'm not talking about the coronavirus). I'd describe it as a sort of shortness of breathe where it feels like you need to yawn, or like you just can't get a full breath, you might understand how that affects your day-to-day life. I bring this up, because I want to fully illustrate that it wasn't all mountain tops and daisies. It was hard work... challenging both my mind and my sanity. My desire for success had pushed me past yesterday's qualifying race and into the main heats, where the pressure dissipated in a sort of apathy coupled with thick skin I grew through the failures. It now comes and goes with the continued stress thanks to the constant, maddening drum beat of tech-debt and upstream decisions made by folks who may or may not fully understand how subtle intricacies affect my ability to get things done.

I've learned the best defense to letting others place stress on you is to simply put yourself first. I've found the book, "First Things First", by Steven Covey covers many key strategies to copping with this sort of situation. Another important thing is to always look out for burnout amoung your teamates, because eventually everyone is going to feel the touch of burnout, because eventually it just gets old. Apathy sets in and you just don't feel that same pep that was once there. Managing that within yourself first is key, but always pay attention to your peers, your team, and even your bosses. Awareness means everything becuase without first being aware, you cannot act to fix things. The better mental state you and your teammates are in, the more effective and successful you will be. Ignore the little things and focus on what matters, the people... and that includes yourself.

Just remember nothing is worth more than your happiness, your health, and those you love. Not data science, not money, and certainly not organizational clout. I'll admit that while I am super passionate about data science and my work, one of the reasons I worked so hard was that I simply didn't have anything better to do with my time. I simply ranked, and still do rank, continuous learning in my top 5 things in life. I'm somewhat thankful for that, because its gotten me where I am today. I just want to stress that its important to keep all your first things first. Yes, you can get there through hard work, but don't sacrifice anything you would regret. It's important to consider this all before diving in, because there are "sink or swim" moments, and if you want to swim you need to assure that you are proactive enough that you can conquer the challenges regardless of the circumstances. Whether that's being a lone wolf like I was, or if you would thrive more in a structured team following more educational qualification then you know that's the route for you - and there's nothing wrong with that. Everyone learns differently, and its important to fully understand yourself before attempting to understand any data.

There are generally 2 quotes I like to think about in a time of failure or crisis that make me feel better about my situation:

  • "Chill out, we're all literally just floating through space."
  • Regarding that time the freighter, Ever Given was stuck in the Suez Canal & blocking $9 billion of global trade each day it remained stuck. In the midst of that, I saw someone tweet, "After this Ever Given nonsense, I don't think anyone will be able to convince me that it can't wait until tomorrow."

Closing Remarks

If I were to sum it all into one word, I'd choose aplomb. Aplomb, defined as complete and confident composure or self-assurance, is important because you will inevitably be challenged when making the jump from an analyst role to an engineering role. Make no mistake, you will be tested, whether that challenge comes from people, code, or cloud architecture doesn't matter; all that matters is how you respond to that challenge, especially so in situations like the one I found myself in. You must always remember that you have the ability to make a choice about how you respond.

I may not have the typical expensive educational background and natural genius that many of my peers in the data science and analytics field have, but somehow I've found myself in the "deployed-a-predictive-model-and-helped-a-company-make-money-off-it-club". Without those academic allocades supporting how others percieve my intellect, I've found that my success has largely been thanks to my passion, curiosity, discipline, proactiveness, and persistance. Well, at least thats what I'd put on a public website. If I were hanging out with a friend, I'd probably just say, "I don't think anyone really knows what theyre doing. We just figure it out after enough trail-and-error".

You're going to have set backs. You're going to fail. You're going to be working longer than others to get the same job done... and thats okay. With time and effort, anything is possible!

Using Prefect and Dask for Distributed Workloads

Batch ML & distributed systems made easy - 5-31-2021

Currently being written! Stay tuned...

Winning In Data Science & Analytics

Navigating a tool-saturated analytics industry - 5-31-2021

Currently being written! Stay tuned...