Getting Started
First, before you continue with this post, please read [part one]({{site.base_url}}{% link _posts/2018-03-02-sending-events-to-aws-lambda-from-s3.md %}) if you haven’t already. Second, if you would like to review the complete source code, you can find it on our Github and use the code to follow along or modify it for your own use. Last, to replicate the process used here, you will need to get Twitter credentials for your account from Twitter Application Management. As for the Twitter API, you can find it on the Twitter Developer page and use it directly or use a library such as this python package from PyPI. I’ve chosen to use the Python package but that’s my preference.
Brief Reminder for Context
The purpose of this Lambda is to generate a tweet when a new post is available. More specifically, it will create
a tweet when a new object whose key begins with /posts
is put
into an S3 bucket. The Lambda is triggered by an event
produced by the S3 bucket of interest.
References
I realized that I’ve provided a large number of links throughout the text so I’ve included them all here for quick reference.
- Github repo for code
- Twitter applications
- Twitter developers
- Twitter library for Python
- AWS Lambda handler
- AWS Blog post on Lambda container reuse
- AWS Key Management Service pricing
- AWS Key Management Service
- AWS Lambda context object
- Python logging in AWS Lambda
- Python getLogger method
- AWS Lambda best practices
- Python list comprehension
- AWS DynamoDB
- Jekyll
- AWS Identity and Access Management
- AWS S3 put event object
Scope
The scope of this post is limited to the Lambda function (the code) that is triggered by an event in S3. This means that the AWS-specific aspects like deployment aren’t included. I decided that it would be easy to lose focus and delve into other concepts such as Identity and Access Management (IAM). Instead, I want to highlight some of the code-specific considerations and provide a real example of how to use this technology.
Function Overview
Below is the entire handler. The handler is the entry point; it will be called when the function is invoked. Each function is run inside a container that may be reused so the handler can be invoked more than once per container deployment. This is an important consideration when we discuss environment variables and Key Management Service (KMS) below. I’ll explain each part beginning with the function definition itself.
{% highlight python %} def lambda_handler(event, context): logger = logging.getLogger(‘tweet’) logger.setLevel(logging.INFO) keys = [record[‘s3’][‘object’][‘key’] for record in event.get(‘Records’, [])] if not keys or len(keys) > 1: logger.error(‘Only one new post at a time is expected.') return
t = twitter.Twitter(auth=twitter.OAuth(token=DECRYPTED_TOKEN,
token_secret=DECRYPTED_TOKEN_SECRET,
consumer_key=DECRYPTED_CONSUMER_KEY,
consumer_secret=DECRYPTED_CONSUMER_SECRET))
tweeter = tweet.Tweeter(t)
tweeter.tweet(f"{URL_BASE}/{keys[0]}")
{% endhighlight %}
Function Definition
{% highlight python %} def lambda_handler(event, context): {% endhighlight %}
lambda_handler
has two parameters: event
, which is usually a dict
and contains attributes about the event causing
the function to run, and context
, which contains runtime information. I will be using the event
object because it
contains attributes that I need but not the context
object. An example of an event
object can be seen in the [repository]
put event.
Logging
Next, logging is configured. Here a logging
object identified by tweet
is created. Using the method
logging.getLogger(name)
and specifying a name will return a reference to the same object so it can be configured once
and then reused. This is useful when working with a logging
object across modules. Some of our code has been extracted to a
separate class so reuse is helpful here. Separating the handler from the business logic is one practice listed in AWS’s
Lambda best practices.
{% highlight python %} logger = logging.getLogger(‘tweet’) logger.setLevel(logging.INFO) {% endhighlight %}
Getting Events
Now the real works starts and the S3 object
keys are retrieved from the the event
object. List comprehension
is used to build a list of keys that are in the event
object. In this particular use case, when an event is received,
it is assumed that a single S3 bucket object
will be in the event
. Anything other than a list
of size 1 is considered
invalid. Remember that this
Lambda is triggered when a key that begins with /posts
is put
into an S3 bucket. Given that posts are generally created one at a time,
one is the only valid length.
{% highlight python %} keys = [record[‘s3’][‘object’][‘key’] for record in event.get(‘Records’, [])] if not keys or len(keys) > 1: logger.error(‘Only one new post at a time is expected.') return {% endhighlight %}
Creating the Twitter Connection
If an event
has been received, and it contains a single key, a connection to Twitter is created using the twitter
library.
{% highlight python %} t = twitter.Twitter(auth=twitter.OAuth(token=DECRYPTED_TOKEN, token_secret=DECRYPTED_TOKEN_SECRET, consumer_key=DECRYPTED_CONSUMER_KEY, consumer_secret=DECRYPTED_CONSUMER_SECRET)) {% endhighlight %}
Using Environment Variables for Sensitive Information
Notice that there aren’t any values hardcoded here. Instead, variables defined outside of the handler are used. Normally I wouldn’t recommend global variables but in this case, they are preferred.
{% highlight python %} URL_BASE = os.environ[‘URL_BASE’] DECRYPTED_TOKEN = decrypt_env_var(os.environ[‘TOKEN’]) DECRYPTED_TOKEN_SECRET = decrypt_env_var(os.environ[‘TOKEN_SECRET’]) DECRYPTED_CONSUMER_KEY = decrypt_env_var(os.environ[‘CONSUMER_KEY’]) DECRYPTED_CONSUMER_SECRET = decrypt_env_var(os.environ[‘CONSUMER_SECRET’]) {% endhighlight %}
Using a convenience function for decryption (shown below), the values for sensitive information are retrieved from encrypted environment variables. The encryption is important due to the nature of the values.
The decryption is also the reason for defining the variables outside of the handler. If they are defined in
the handler, the environment variables will be decrypted every time the handler is called. KMS is a great service but it has
associated costs. If you are creating a customer master key (CMK), which is used for in-transit encryption
of environment variables, you will pay for each CMK and per x
number of requests; I use x
intentionally as the price will
vary depending on region and when you are reading this. Therefore, defining them once can save you money throughout the life of your
function.
{% highlight python %}
def decrypt_env_var(env_var: str) -> str:
"””
example return value from decrypt
{
‘KeyId’: ‘string’,
‘Plaintext’: b’bytes’
}
"””
return boto3.client(‘kms’).decrypt(CiphertextBlob=base64.b64decode(env_var))[‘Plaintext’].decode(‘utf-8’)
{% endhighlight %}
Notice that Plaintext
above is a bytes
object and not a str
so it will need to be decoded. It took a bit to realize
this so don’t make that mistake.
On the subject of CMK and KMS, I encourage you to create your own key as it provides more security. If you use the default key, you can encrypt your environment variables at rest but not in transit. Encryption in transit is important for protecting sensitive information such as your Twitter API keys.
Posting a New Status
Finally, the new status (new tweet) is created and posted.
{% highlight python %} tweeter = tweet.Tweeter(t) tweeter.tweet(f”{URL_BASE}/{keys[0]}") {% endhighlight %}
In an attempt to separate the logic needed for AWS Lambda and the business logic, a separate class is used to actually create the
tweet (see twitter.py
in the repository). Admittedly, it’s a small class but it can aid in testing to create separation. The Tweeter
class
that posts the tweet is shown below. Most of the code uses the twitter
library.
{% highlight python %} import twitter import logging
class Tweeter: def init(self, t: twitter.Twitter) -> None: self.connection = t self.logger = logging.getLogger(‘tweet’)
def tweet(self, new_post: str) -> None:
status = f"Check out our latest blog post: {new_post}"
try:
self.connection.statuses.update(status=status)
self.logger.info('Successfully created new tweet.')
except twitter.TwitterError:
self.logger.error('An error occurred while creating tweet.')
{% endhighlight %}
In order to reuse the configured logging
object, a named logging
object is created in the init
method of Tweeter
.
Future Improvements
Admittedly, there are things that can be done to improve this setup for a more production-ready deployment. First, using a database such as DynamoDB can aid in tracking posts and preventing duplicate tweets in the event of edits to existing posts. Also, given our setup, when Jekyll builds the site, it builds all files so S3 will mark all posts as new posts, which isn’t accurate and using an alternative eventing strategy may be required. Lastly, you can consider adding some content to the post that could be used to provide some context to the tweet, outside of just commenting that a new post is available. One possibility is allowing Lambda access to the S3 bucket to retrieve and parse the post.
I hope that this example provides you with enough to get started with your own Lambda-based applications. If you have questions or comments, please post them in the Disqus section below.