Python logging best practices with JSON steroids.

In critical systems such as planes, elevators or trains there are boxes logging everything, all the time. Because if the system crashes, somebody will have to understand what happened to prevent further occurrences. The same goes for mobile or web apps: if it matters to at least one user, you can be sure that in case of failure a guy somewhere will be in charge of understanding why it happened and how to patch it. Plus it’s usually the same guy that needs to answer on a daily basis any technical or business questions related to this very system.

And so logging matters for your app just as well as for critical systems. If you don’t have access to logs, if they’re stored in various places and formats, if they contain poor information the “guy somewhere” is going to waste lots of time.

With this in mind, I listed below all the Python logging good practices I know of to help you get the best visibility in your system. The first part will be covering the basics, while the second part is for the more adventurous ones, those willing to get the most out of their Python logs by adding context, metrics or using log analytics tools. So, let’s get going!

I. A couple of Python logging good practices

1) Use the standard Python logging module

It really is not recommended to use “print” as your standard logging function:

print ‘Do not rely on this example!’

Indeed, the Python community has created a Python logging module and everyone can thus rely on the same standard.

Why does it matter?

The logging module lets you control the severity level you want to actually produce depending on the output appenders and deployment (dev, staging, production, …). Logger events are also generated as place and time-specific. Events thus come with a:

Date
Module name
Logger name
Function name
Line number
…

All these information pieces are, as you would expect, extremely useful while troubleshooting applications.

In addition to this, when somebody else uses your modules they expect to be able to leverage the log events they’ll produce.

So how do I do it?

import logging

logging.basicConfig(level=logging.DEBUG)

logger = logging.getLogger(__name__)

#Info level msg

logger.info(‘This is a standard message’)

#Debug level msg

logger.debug(‘This is what you may want to see… sometimes…’)

#Warning level msg

logger.info(‘This is what you are usually not happy to see’)

If you run it you should see the following appearing:

INFO:__main__:This is a standard message

DEBUG:__main__:This is what you may want to see… sometimes…

WARN:__main__:This is what you are usually not happy to see

Now if you don’t want to get all the log entries in the DEBUG level, you just have to change the base configuration level to a higher one such as INFO or WARN.

2) Log everywhere you can

So now that you have a good logging library you can rely on, the goal is to be able to use it as much as you can.

In other words, your code has to provide as much information as it can at the right severity level. Indeed, we often think about warn, severe and exception levels as these are the usual suspects warning you that something went wrong. However, if you wish to really explain actions in the past, do root-cause analyses or trace queries services, you need to also log at info and debug levels.

Let’s have a look at the example below: if you were to code the following piece, you would get traces from all the queries that go through some APIs, which can come as quite handy.

defhandle_query(request):

logger.info(‘Starting request %s’, request)

# handle query here

response = ‘response’

logger.info(‘Returning response: %s’, response)

And in developing stage, logging in DEBUG everywhere can help you understand a complex algorithm or do message tracing etc…

3) Stick to the standard logger name

If your log format displays the logger name, you will be able to identify where all your log entries come from. And there is no reason to change a standard, especially when it provides useful information. When you get or create a logger simply use __name__ which is the current module name.

To illustrate this point, let’s say you call logger.getLogger(__name__) in the foo.bar.my_great_module. It would have exactly the same effect as calling logger.getLogger(“foo.bar.my_great_module“). So why making things more complex?

4) Capture all exceptions tracebacks

I’m not going to push for catching as much exceptions as you possibly can. But Python has the ability to provide the stack trace (called traceback) which are very useful to understand issues.

Here is how to capture them with Python logging:

import sys

try:

f = open(‘myfile.txt’)

s = f.readline()

i = int(s.strip())

except Exception, e:

logger.error(‘Failed to read the file’, exc_info=True)

The Python traceback then looks like:

ERROR:__main__:Failed to read the file

Traceback (most recent call last):

File “mymodule.py”, line 3, in module

open(‘myfile.txt’, ‘rb’)

IOError: [Errno 2] No such file or directory: ‘myfile.txt’

5) Use logging configuration files

Some people configure their Python logging directly from the code, but doing so, you would lose a lot of flexibility that’ll be extremely useful when deploying on multiple servers! After Python 2.7, you can load logging configuration from a dict, which means you can load the logging configuration from a JSON or YAML file.

version: 1

disable_existing_loggers: False

formatters:

simple:

format: “%(asctime)s – %(name)s – %(levelname)s – %(message)s”

handlers:

console:

level: DEBUG

formatter: simple

stream: ext://sys.stdout

info_file_handler:

level: INFO

formatter: simple

filename: info.log

maxBytes: 10485760 # 10MB

backupCount: 20

encoding: utf8

error_file_handler:

level: ERROR

formatter: simple

filename: errors.log

maxBytes: 10485760 # 10MB

backupCount: 20

encoding: utf8

loggers:

my_module:

level: ERROR

handlers: [console]

propagate: no

root:

level: INFO

handlers: [console, info_file_handler, error_file_handler]

II. Let’s add some steroids

Until now, I’ve just covered how Python logging should at least be… However, new logging technics emerged recently thanks to new log management tools such as on-premise systems Splunk, open source ones such as ELK (Elasticsearch, Logstash, Kibana) or SaaS ones such as Logmatic.io and I want to share them with you.

1) Log in JSON

Logging in JSON makes a lot of sense when your log data is consumed by machines. JSON structure brings flexibility and its hierarchy is directly understood and re-constructed by JSON parsers. So why using custom log formats (which may subjected to future changes…) and then go through the trouble of defining complex parsers to extract valuable attributes?

Custom format VS JSON format

Here is a log sample with a custom format:

# Log formatted as “%(asctime) – %(name) – %(levelname) – %(message)”

2011-08-22 17:52:54,828 – lm.auth – WARN – No user currently authenticated.

In JSON format the same log entry will look like this:

{

“funcName”: “validate_user”,

“filename”: “auth.py”,

“lineno”: 362,

“processName”: “MainProcess”,

“module”: “auth”,

“name”: “lm.auth”,

“levelname”: “WARNING”,

“message”: “No user currently authenticated.”,

“asctime”: “2011-08-22 17:52:54,828”,

“threadName”: “MainThread”

}

As you can see here, all the meta-attributes associated to the log event are automatically extracted from JSON parsers. And what’s more, you can attach to a log additional context such as its userId, customer (for multi tenant platforms for instance), metrics such as response time, etc… And if you do so the JSON-ified log will display them all.

Multi-lined tracebacks

When working with Log Management tools, one big challenge is to deal with multi-line logging as it is then hard to find out originating events. And Tracebacks are multi-lined. In JSON format, the traceback is totally embedded in the event.

{

…

“exc_info”: “Traceback (most recent call last):\n File \”test/write_in_console.py\”, line 24, in exception_test\n raise Exception(‘test’)\nException: test”,

…

}

So here you are: good log management tools are now able to display it properly.

It all makes sense! So… How do I setup JSON format?

Unfortunately, the Python logging module does not contain any proper JSON formatter. That’s why the community is here, to help! You’ll can find a good JSON formatter in our Github.

Here is how to configure it:

importlogmatic

import logging

logger = logging.getLogger()

handler = logging.StreamHandler()

handler.setFormatter(logmatic.JsonFormatter(extra={“hostname”:socket.gethostname()}))

logger.addHandler(handler)

logger.setLevel(logging.INFO)

You probably also noticed that I added some extra context (the hostname here) that is going to be attached to all the log events passed through this handler.

2) Add as much context as you need

Context matters! Adding contexts and metrics makes complete sense if you want to reach the next level. Remember the time you or a guy near you had to troubleshoot a critical issue for an important customers. It was a journey into the darkness…

Troubleshooting

A good log format combined with a good logging tool allow you to quickly spot and isolate log events emitted for serving this customer, severity level, and modules that emitted warnings. And so your troubleshooting time is divided by 5 or 10… And you also provide quick answers to your customers!

So let’s add a customer name and the number of trials to my previous log:

logger.warn(“No user currently authenticated.”,

extra={“customer”: “my_beloved_customer”, “login_name”: “[email protected]”})

My previous JSON then becomes:

{

“funcName”: “validate_user”,

“filename”: “auth.py”,

“lineno”: 362,

“processName”: “MainProcess”,

“module”: “auth”,

“name”: “lm.auth”,

“levelname”: “WARNING”,

“message”: “No user currently authenticated.”,

“asctime”: “2011-08-22 17:52:54,828”,

“threadName”: “MainThread”,

“customer”: “my_beloved_customer”,

“trials”: 4

}

Great isn’t it? Now you can stream it to your preferred tool and input all these attributes into their analytics. Here is what severity levels by module names coming from dozen of servers can look like then:

Operational intelligence

Context can also be used for operational intelligence purposes. Let’s take a simple example where an e-retailer decides to log whenever a cart is paid. With context, you can attach the user id, the time spent on the website, the size and price of the cart, what did he search and all the information that is relevant to understand your users.

But operational intelligence is a big topic and we’ll have other occasions to talk about this.

Wrapping up

I hope this article was useful to you and you learnt some new techniques. You’ll probably realize after only a couple of trials that you log entries can be much more useful than what they are in most places today. Standards, JSON, contextual data and probably a good log management tool can make a big difference on your day-to-day work. On this final note… Enjoy your Python logging!