Crunch Big Data on Your Laptop With Polars Streaming
Polars streaming avoids out-of-memory errors in large cross joins via processing data in chunks. Learn how to run 27M row workloads on a single machine.
Polars streaming avoids out-of-memory errors in large cross joins via processing data in chunks. Learn how to run 27M row workloads on a single machine.
Refactoring an RCE machine learning algorithm from Pandas lambda functions to the Polars expression API reduced execution time from six minutes to fourteen seconds. Polars cross joins, columnar operations, and Apache Arrow drive a 25x speedup.
Build a lightweight capacity planning model in Python Pandas using flow diagrams, throughput estimates, and GROUP BY operations to estimate CPU requirements and infrastructure cost. Apply Operations Research concepts to size a simple web...
Identify investment grade copies of sealed Super Mario Bros. 3 variants through Python Pandas, Seaborn, and auction sales data. Normalize prices across market cycles and compare box grade, seal grade, release variant, and sale date to rank...
Witness practical Pandas, Seaborn, and Matplotlib techniques for exploring machine learning datasets using the UCI Abalone database. Includes histograms, KDE plots, boxplots, correlation heatmaps, PCA, regression plots, and multidimensional...
Amazon Web Services (AWS) SageMaker Notebook Instances provide fully managed Jupyter Notebooks, tailored for Data Science and Machine Learning (ML) use cases. These notebooks allow Data Scientists and ML Engineers to explore, operationalize and...
The Amazon Web Services (AWS) Elasticsearch service provides GROUP BY operations via the aggregations, or AGGS, Application Programming Interface (API). GROUP BY and AGGS operations provide syntax to collapse rows with similar values into summary...
Operational document stores require backups for disaster recovery and data migration. Elasticsearch uses the term snapshot for their backups. Amazon Web Services (AWS) provides a fully managed Elasticsearch service that includes both automatic...
Elastic Architects designed the distributed Elasticsearch platform to follow NoSql principles. In the traditional Relational Database Management System (RDBMS) world, SQL databases use GROUP BY syntax to group rows with similar values into...
In this HOWTO, I will describe the process to connect an Ubuntu EC2 instance to the Amazon Web Services (AWS) provided Elasticsearch Service via the boto3 Python library. This blog updates my incredibly popular original post on this topic which...
This blog post describes how to configure Flask to emit form data to your own personal Gmail account. You don't need to use Gmail, in fact, you can configure Flask to send data to any email account you have access to. This architecture uses...
Flask, Lambda, API Gateway, IAM and S3 enable massively scalable web database applications. Flask provides a simple, Pythonic Model View Controller (MVC) framework to develop the application logic. Lambda and API Gateway provide pay-per-use...
I deployed my first web database application back in 2002 thanks to the seminal O'Reilly book Web Database Applications with PHP and Mysql by David Lane and Hugh E. Williams. In the past sixteen years, the industry developed tons of frameworks...
In this HOWTO, I will demonstrate how to easily integrate the Google reCAPTCHA service into a Flask web application using Flask-WTF. The following cartoon depicts the end result. A Flask application server provides a simple (beautified) survey to...
In this HOWTO, we expand upon the simple Elasticsearch proxy we deployed in our first Lambda tutorial. In that tutorial, we showed you how to create a proxy in front of the AWS Elasticsearch service using a Lambda function and an API Gateway. We...
All production databases require backups. The AWS Elasticsearch documentation states: Amazon Elasticsearch Service (Amazon ES) takes daily automated snapshots of the primary index shards in an Amazon ES domain... However, you must contact the AWS...
The Python Elasticsearch Domain Specific Language (DSL) lets you create models via Python objects. Take a look at the model Elastic creates in their persistence example. #!/usr/bin/env python # persist.py from datetime import datetime from...
Amazon Web Services' (AWS) Lambda provides a serverless architecture framework for your web applications. You deploy your application to Lambda, attach an API Gateway and then call your new service from anywhere on the web. Amazon takes care of...
Flask-WTForms helps us create and use web forms with simple Python models. WTForms takes care of the tedious, boring and necessary security required when we want to use data submitted to our web app via a user on the Internet. WTForms makes data...
Welcome to the fifth part of this HOWTO, where we will call a remote web service to locate our test takers. Once you complete this HOWTO, you will have implemented the following architecture: To recap what we’ve done so far: Part One: Deploy an...
In HOWTO-1, we deployed an Amazon Web Service (AWS) Elasticsearch domain and connected to it via a combination of Identity and Access Management (IAM) roles, IAM profiles and the Boto library. In HOWTO-2 we deployed a Flask web server that...
In this tutorial you will: Connect a Flask server to the Bootstrap service Create a trivial Jinja2 template for a Quiz form Use Bootstrap to validate forms on the client side Use a Flask "flash" message to validate forms on the server side Let's...
In this tutorial you will learn The best* way to update an AWS provided Elasticsearch service index via an Internet facing web form *In terms of flexibility, security and ease of deployment How to deploy web forms in Flask How to get Flask to...
NOTE: Click here to find an update to this blog post which uses Boto3 and Elasticsearch 7.X Step one of our journey connects EC2 to ES using the Amazon boto Python library. I spent more than a few hours pouring through the AWS help docs and...