Polars Boosted My Algorithm's Speed by 25x

Refactoring an RCE machine learning algorithm from Pandas lambda functions to the Polars expression API reduced execution time from six minutes to fourteen seconds. Polars cross joins, columnar operations, and Apache Arrow drive a 25x speedup.

comments

Quick and Easy Capacity Planning with Pandas

Build a lightweight capacity planning model in Python Pandas using flow diagrams, throughput estimates, and GROUP BY operations to estimate CPU requirements and infrastructure cost. Apply Operations Research concepts to size a simple web...

comments

Data Exploration with Data Viz Cheat Sheet

Witness practical Pandas, Seaborn, and Matplotlib techniques for exploring machine learning datasets using the UCI Abalone database. Includes histograms, KDE plots, boxplots, correlation heatmaps, PCA, regression plots, and multidimensional...

comments

Refactor Matlab to Tidyverse

Refactor a Reduced Coulomb Energy neural network implementation from Matlab into R Tidyverse with pipes, tibbles, functional operations, and vectorized distance calculations. Compares loop-based Matlab patterns with tidy data workflows for...

comments

FastAI x Flask - Mods vs. Rockers!

Fastai provides helper functions on top of Pytorch to help us wrangle, clean, and process data. In this HOWTO we will accomplish the following: Deploy an AWS g3.8xlarge instance Compile and install NVIDIA drivers on our g3.8xlarge instance Use a...

comments

Big Data Idol: The Math

Caution! Math Ahead! For the Math-phobic, I explain how I crunch the test results in a math-free, simple and focused blog post here. I use math here, so this may be your last chance to escape! Still with me? Excellent! The bullets below outline...

comments

Big Data Idol: How I Crunched the Numbers

Do you have big data chops? Quick, what do these three things have in common? Yankees, Giants, Rangers, Knicks What about these? Beatles, Monkees, Beach Boys Do you have an answer for each? "New York," for example, for the first list and "Rock...

comments

Let us now praise ugly code!

In this blog post I will revisit the first piece of code I wrote with the R Programming language, back in the early part of this decade. Coming from an Octave/MATLAB background, I really enjoyed the functional nature of R. I imagined flinging...

comments

Why A "Big Data" Personality Test?

Why do we need yet another personality test? Because, without "big data" technologies online "personality tests" suffer these problems: With most tests, we quickly see a pattern to the answers, and can easily steer the test to the outcome we want...

comments