AWS Digest – 2021-11-28

Once again, lot of new interesting feature going on in AWS:

AWS Lambda:

Event Filtering for SQS, DynamoDB and Kinesis Sources

This one is really a huge, neat feature, that can really change application design, for those 3 event sources it’s now possible to setup basics filters on the messages sent. This will allow a certain level of polymorphism in the Lambda Workers.

The filters can be set on any element of the event data or metadata. And filters are pretty basic, but still really powerful.

A complete documentation of the feature is available on AWS Documentation.

For me a real first use of this feature will be at work. I have a data pipeline that is used to move data from Redshift to multiple applications RDS instances that need to be improved in prod environment. Currently we trigger SNS topic from S3, but as we want to move application one by one, I had to options before: update S3 filter application by application, that will be not really practical, and creates a lot of opportunities for things to go wrong, or update both the new worker and old worker lambda to filter out event on receive which create extra lambda invokes.

With this new feature, I’ll be able to swtich from SNS to SQS event trigger, put a filter on my events (no extra lambda calls) with minimum impact on both lambda (Only need to remap the source event from SNS to SQS).

AWS Redshift:

Performance Enhancements for Data Sharing

This should really help, especially on big demand peaks, like when you have daily delivery of hundreds of TB of data that needs to be loaded, then triggers analytics queries to feed your back-end. As this is mend to work with concurrency scaling that will help to burst you cluster capacities a peak requirements. Basically you will want short lived compute cluster to help tackling down the massive queue building up to help your cluster work at best for the rest of the day.

AWS EC2:

Predictive auto scaling

In some case, especially in data science, you can predict a bit ahead when you need to scale up your compute resources. Let’s say that you see coming a big number of messages in a queue that is a the left of you compute pipeline and you know that this will result in a hellish load a few moments later. It’s now possible to trigger an auto-scaling on custom metrics in Cloudwatch, including metrics from other services.

This means that when the load hits your compute, it will already be scaled up to face the load instead of getting hit by the load, then having the auto-scale kick in meaning that your infrastructure will be at scale a few minutes later.

AWS App Runner:

Support of GitHub actions

First side note, expect an article on GitHub actions soon, as I start using it in one of my side projects (I really have a lot of unfinished projects which is fine, as most are experiments more that thins I really want to finish)

GitHub actions are neat, basically it’s events that are trigger each time a specific event (merge request, push, etc…) comes in a specific branch of your GitHub repository (and setup example is this actions that runs tox tests in each push)

Now you can link these action to App Runner, this means you’ll be able to do your tests/build inside the managed runtime you want that can be specific to your needs.

AWS Athena:

Accelerated queries with Glue Catalog

This one can be really interesting when you have an heavy use of Glue catalogs, and a massive number of partition. What you have to do is go into Glue and create an index then enable partition filtering in Athena. It will allow your Athena queries to run faster as the indexes will be stored in glue and wont have to be recreated by scanning the partitions once again.

Note that Glue partition indexes can also be really useful on Redshift Spectrum tables, and EMR.