Logs are a critical component of any application or system. They contain valuable information that can help developers troubleshoot issues and improve performance. However, analyzing logs manually can be daunting, especially if there are large volumes of logs to sift through.

Back in the day, when I was still a QA engineer (how I became a Python developer here), I was in this situation a lot and needed to figure out a way to AUTOMATE the parsing/analyzing process.

That’s a perfect opportunity to build your own log parsing tool in Python! 🔥

Recently I happened to run into a similar situation where it made sense to write my own log parsing tool in Python.

In this article, we’ll explore how you can build a log parsing tool in Python and the benefits of using Python for log parsing.

Can I Build A Log Parsing Tool In Python?

Yes, you can! Building a log parsing tool in Python is not only possible but also practical! Python has a vast array of libraries and tools that make it a suitable language for data analysis and automation tasks.

If you’re keen on sticking to the basics, it’s FAIRLY EASY to build a simple Python log parser tool without any libraries, just using the built-in modules! 👌🏻

With Python, you can easily define log formats, parse log files, extract relevant information, store data in a structured format, and analyze and visualize the data.

So, if you are dealing with logs and want to make sense of them, Python is a great choice.

🚨 In the following sections of this post, I’ll explain my log parsing situation and what I used to build my own log parsing tool in Python!

What Is The Purpose Of Log Parsing?

If you have ever worked with web servers, you may be familiar with log files.

In the context of Python and web servers, log parsing refers to the process of analyzing the log files generated by a web server.

These files contain a record of every interaction that takes place between a user and a website.

The purpose of log parsing is to extract useful information from the log files, such as:

The number of requests made to the server
The IP addresses of the clients
The types of requests made
The response codes returned by the server
And the amount of data transferred

For example, here’s the type of log files I deal with very often:

[embed-file][text]logs.txt[/text][link]https://robertsgreibers.sfo3.digitaloceanspaces.com/pythonic.me/files/post/can-i-build-a-log-parsing-tool-in-python/logs.txt[/link][/embed-file]

Parsing the above log file can help you with troubleshooting and debugging.

If a problem is reported to me by our CTO, with the server log files we can examine log entries and determine the cause of the problem.

Most commonly, after releasing a new version of code in the production sometimes things can break and, for example, users can have trouble accessing a particular page, finalizing a payment, etc.

In this case, we search through log files to find possible errors generated at the moment when the page was requested.

Is Python Good For File Parsing?

Yes, Python is a very good language for log file parsing.

Python has a rich set of libraries and tools for working with text data, which makes it well-suited for parsing and analyzing log files.

Personally, my favorite Python library for log file parsing is re

Python re module

Python has a built-in re module that provides regular expression functionality, which is useful for searching and matching patterns in log file data.

I’ve already written a couple of posts on log file parsing with regular expressions, here’s a list of them in case you need more examples:

✅ – How Do I Search For Multiple Patterns In Python?

✅ – How To Extract XML Data Tag Value With Regex

✅ – Real Regular Expression Situation In Python

What Is An Example Of Data Parsing?

Here’s a simplified example of data parsing in Python just to give you very clear idea of what we’re working with.

[embed-youtube-video]https://youtu.be/OGbX6cdR1pU[/embed-youtube-video]

Suppose we have a log file that contains the following data:

2022-03-24 15:00:01 INFO: Starting server
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:00:05 WARNING: Request from IP 192.168.0.1 was blocked
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:02 ERROR: Could not connect to database
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 Some other message
2022-03-24 15:01:05 INFO: Server shutting down

To extract useful information from this log file, we can use Python to parse the data and extract specific fields.

For example, we might want to extract the timestamp, log level, and message for each log entry that’s useful to us.

Here’s an example Python script that uses regular expressions to parse the log file and extract this information:

import re

log_file = 'server.log'

with open(log_file, 'r') as f:
    for line in f:
        match = re.search(r'^(.*)\s(\w+):\s(.*)$', line)
        if match:
            timestamp = match.group(1)
            log_level = match.group(2)
            message = match.group(3)
            print(f'{timestamp} - {log_level}: {message}')

In this script, we use the re.search() function to match each log entry against a regular expression pattern.

The pattern captures three groups of data:

Timestamp
Log level
Message

Here’s a detailed explanation of regex pattern we used. Here’s a website (regex101.com) that lets you easily debug regex patterns and come up with your own.

Python regular expression pattern explained

We then extract these groups using the group() method of the Match object, and print the results in a formatted string.

Here’s the print results:

2022-03-24 15:00:01 - INFO: Starting server
2022-03-24 15:00:05 - WARNING: Request from IP 192.168.0.1 was blocked
2022-03-24 15:01:02 - ERROR: Could not connect to database
2022-03-24 15:01:05 - INFO: Server shutting down

Process finished with exit code 0

How Do You Make A Log Parser In Python?

To make a log parser in Python, you can follow these general steps:

Define the log format: This will typically involve identifying the fields in each log entry (such as timestamp, log level, message, etc.) and the separators or delimiters used to separate these fields.
Open the log file: Use Python’s built-in open() function to open the log file – specify the appropriate file mode (e.g., read-only mode).
Read the contents: Choose one of the reading methods – read() or readlines() method of the file object.
Parse the log data: Once you have the log data in memory, you can use various parsing techniques to extract the desired fields from each log entry. This can include using regular expressions to match patterns in the log data, or using string manipulation functions to extract specific substrings.
Prettify log data: Prettify the log data by converting it into a more readable format (e.g., JSON) using a suitable parser or library. Make the results easy to read for everyone.
Analyze the log data: You can use Python’s data analysis and visualization tools to analyze the log data and gain insights into system performance, errors, and other key metrics. Or just simply read the results and understand where the issue is.

How Do You Read All Data From A File In Python?

Let’s take a recent log file I was working with and build an actual log file parsing tool.

[embed-file][text]logs.txt[/text][link]https://robertsgreibers.sfo3.digitaloceanspaces.com/pythonic.me/files/post/can-i-build-a-log-parsing-tool-in-python/logs.txt[/link][/embed-file]

As the general steps above suggest, one of the first things you want to do in a log file parsing tool is reading the file. In this case, I prefer using readlines() method, here’s an example code snippet:

import re
import json

with open('./logs/logs.txt', 'r') as log_file:
    content = log_file.readlines()

import re: This line imports the built-in re module in Python, which provides support for regular expressions.

import json: This line imports the built-in json module in Python, which provides support for working with JSON data. (We’ll use it to pretty format logs and make them very easily readable)

with open('./logs/logs.txt', 'r') as log_file:

This line opens a log file named logs.txt located in a subdirectory called logs relative to the current working directory.

The 'r' argument specifies that the file should be opened in read mode.

Python read mode

The with statement is used to ensure that the file is properly closed when the block of code is exited.

content = log_file.readlines()

The above line reads the contents of the log file into a list of strings called content.

Each string in the list represents a single line of the log file.

What Is The Python Library For Log Parsing?

At this point of building a log file parsing tool, it’s time to decide if you need to use a specific library for log parsing or if you can rely solely on the built-in re module.

Python logo

The answer depends on the complexity of your log file format and the parsing requirements for your specific use case.

In many cases, the re module provides enough functionality to extract the relevant data from log files using regular expressions, that’s why I’m going to stick with just regular expressions in this example.

By using the re module, you can avoid the overhead of importing and learning a new library, which can save you time and effort in the long run.

How To Extract Data From Log File In Python?

Now that you have read the log file in Python, we can start extracting data from it.

Personally, when I’m reading log files and looking for an issue, the first thing I need to see is WHEN THE SPECIFIC ACTION HAPPENED!

Looking at the log file logs.txt from above, you can see that each line starts with a date and time, let’s use a regular expression to extract each value.

Mar 14 12:53:18 portal gunicorn[2119]:
Mar 14 12:53:18 portal gunicorn[2119]:
Mar 14 12:53:25 portal gunicorn[2119]:
Mar 14 12:53:26 portal gunicorn[2119]:
Mar 14 12:53:29 portal gunicorn[2119]:
Mar 14 12:53:49 portal gunicorn[2119]:
Mar 14 12:54:09 portal gunicorn[2119]:
Mar 14 12:54:26 portal gunicorn[2119]:
Mar 14 12:54:26 portal gunicorn[2119]:

How To Fetch Data Using RegEx In Python?

You open the file and read its contents into a variable called content using the readlines() method.

Now, you’ll have a list where each element is a line from the logs.txt file.

Next, you’ll likely want to initialize a variable called prettified_content as an empty string.

prettified_content where you’ll collect only the extracted, valuable text parts.

import re
import json

with open('./logs/logs.txt', 'r') as log_file:
    content = log_file.readlines()

prettified_content = ''

for line in content:

    date = re.search(r'^(\w{3}\s\d{1,2}\s\d{2}:\d{2}:\d{2})', line)

    if date:
        date = date.group(1)

        prettified_content += f'\n\ndate: {date}\n'

print(prettified_content)

The code then iterates over each line in the content list using a for loop.

For each line, it uses regular expressions (imported with the re module) to search for a pattern that matches a specific format of a date string.

The regular expression used is '^(\w{3}\s\d{1,2}\s\d{2}:\d{2}:\d{2})' which matches a string that starts with three letters (representing the month), followed by one or two digits (representing the day of the month), followed by a space, followed by a string representing the time in the format hh:mm:ss.

If the line contains a matching date string, the code extracts the date from the line using the group() method of the match object returned by re.search(), and assigns it to a variable called date.

The code then appends a prettified version of the log entry to the prettified_content string variable.

This includes a newline character ('\n') followed by the string 'date: ', followed by the extracted date string, followed by two newline characters ('\n\n').

This creates a visually separated block of text for each log entry.

Finally, the code prints out the entire prettified_content string variable, which contains all of the prettified log entries:

date: Mar 14 12:53:18


date: Mar 14 12:53:18


date: Mar 14 12:53:25


date: Mar 14 12:53:26


date: Mar 14 12:53:29


date: Mar 14 12:53:49


date: Mar 14 12:54:09


date: Mar 14 12:54:26


date: Mar 14 12:54:26


Process finished with exit code 0

How RegEx works in Python?

In Python, regular expressions (or RegEx for short) are handled by the re module as you’ve already seen from previous steps.

But the biggest burden with RegEx is to come up with your own patterns, though.

The most common question I get from my students is: “But how do you know the pattern come up with?”

Here’s a website for the tool I use regex101.com.

Python regex pattern creation

You simply set the “TEST STRING” and try different patterns until you come up with the one that matches the text you want to extract.

On the right side of the page, you can see the explanation of the pattern you’ve come up with:

Python regex pattern explained

How To Parse JSON Logs In Python?

Many of the log files I have to deal with contain JSON or Python dictionary type of data. The problem with gunicorn and other similar server logs is that they’re usually stuffed into a single line.

Obviously, this makes the analyzing process way harder. I used to copy these logs into PyCharm and manually “pretty format” them to actually be able to read them and understand where the problem is..

You can already see how frustrating this process can get without automation..

Here’s what you can do to pretty format JSON data while parsing log files in Python.

The following Python code searches for JSON data within the a.extra field of a log line and converts it to a prettified JSON string.

This is going to be the next part of your log parsing tool (check out the full script below where this part is already included):

a_extra_match = re.search(r'a\.extra:\s*(\{.*\})', line)

if a_extra_match:
    a_extra = (
        a_extra_match.group(1)
        .replace("'", '"')
        .replace("None", 'null')
        .replace("False", 'false')
        .replace("True", 'true')
    )
    a_extra_dict = json.loads(a_extra)
    pretty_a_extra = json.dumps(a_extra_dict, indent=4)

    prettified_content += f'a.extra: {pretty_a_extra}\n'

The re.search() function searches for the pattern a\.extra:\s*(\{.*\}) within a log line.

This pattern looks for the string a.extra: followed by zero or more whitespace characters and a set of curly braces containing any characters (including newlines) using the .* wildcard.

The parentheses () around the curly braces indicate that the contents of the braces should be captured as a group.

If the regular expression search finds a match, the if statement evaluates to True. The matched JSON data is stored in the group(1) of the a_extra_match object.

The code then replaces any single quotes in the JSON string with double quotes using the replace() method, since JSON requires double quotes around its keys and values.

The replace() method is also used to replace any occurrences of None, False, and True with the corresponding null, false, and true values in JSON.

The modified JSON string is then loaded as a Python dictionary using the json.loads() method. This dictionary is then converted back to a prettified JSON string with an indentation level of 4 using the json.dumps() method.

The prettified JSON string is then added to the prettified_content string using string interpolation with the f prefix. The resulting string includes the label a.extra: followed by the prettified JSON data, separated by a newline character (\n).

And if we combine everything done so far together into single script, this is what you’ll get:

import re
import json

with open('./logs/logs.txt', 'r') as log_file:
    content = log_file.readlines()

prettified_content = ''

for line in content:

    date = re.search(r'^(\w{3}\s\d{1,2}\s\d{2}:\d{2}:\d{2})', line)

    if date:
        date = date.group(1)

        prettified_content += f'\n\ndate: {date}\n'

    a_extra_match = re.search(r'a\.extra:\s*(\{.*\})', line)

    if a_extra_match:
        a_extra = (
            a_extra_match.group(1)
            .replace("'", '"')
            .replace("None", 'null')
            .replace("False", 'false')
            .replace("True", 'true')
        )
        a_extra_dict = json.loads(a_extra)
        pretty_a_extra = json.dumps(a_extra_dict, indent=4)

        prettified_content += f'a.extra: {pretty_a_extra}\n'

print(prettified_content)

And here’s the output so far. Makes it way easier to read the log file and possibly find issues with the requests made on the server side.

date: Mar 14 12:53:18


date: Mar 14 12:53:18


date: Mar 14 12:53:25
a.extra: {
    "client_ip": "84.55.44.247",
    "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJGc1JyMzlneXY3X3dNdzVLT1RqMHg0RXNVZ1BUeVNocjltdC1NNlR2U09rIn0.eyJleHAiOjE2Nzg4MDAxOTgsImlhdCI6MTY3ODc5ODM5OCwiYXV0aF90aW1lIjoxNjc4Nzk4Mzk4LCJqdGkiOiJhNzYyZDY2Zi1jNTcwLTQ5MDQtYjcyNS01ZDc5Y2I4MWVlNzYiLCJpc3MiOiJodHRwczovL2tleWNsb2FrLnNiLmx0L2F1dGgvcmVhbG1zL1NCX0MiLCJzdWIiOiJkYTg4MDA0Ny01NGIwLTQwMzctOTExOC1jZTA0YmI1ZDBjMGEiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJ4czJhLXJlc3RhcGktY2xpZW50Iiwibm9uY2UiOiJjNzFmMTZlMi01MzRkLTQyNDYtOThmYy00ZGY5ZjQ0YWJjNmYiLCJzZXNzaW9uX3N0YXRlIjoiZTU5NWFmYTktODhiZC00Njc5LWFlMmQtN2U4MGE5NDFhNDQ1IiwiYWNyIjoiMSIsInNjb3BlIjoieHMyYSBvZmZsaW5lX2FjY2VzcyIsInNpZCI6ImU1OTVhZmE5LTg4YmQtNDY3OS1hZTJkLTdlODBhOTQxYTQ0NSIsInByZWZlcnJlZF91c2VybmFtZSI6ImtsMjk4NTgxcCJ9.ujzG_QG4tFtgF3oysDnWG__CuVSNnkEYOv79rpxYG_CSd69FmFiypRBNhHhIXUG2lUmCUGJaCrPX7lkf4oYklatI_cBoUHUxp4dRou9wdfdf6gsXwCRnI0rZDRm4TYJm8Fg4gzUky0Jl2Fb5kaC9DzgLLYnom8UKYxEmW8eJSMsOic2efXxsddIWiqMNYebBgXZEEqHpYi-igVJKeew_TuHTq6Ma7aFmBLsp6LbUy5-dUPXwzXRWBCTfCFBpOX4yFeOCWrah8qqvuXliAmwYaEgOA8gHtBE21v3-UrnNrUNADU0ooGwmdsZWIcnEE2uzJZrsq2YvsWJ5CPp5iViHfQ",
    "referrer_url": null,
    "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJkNGQ5MDU2Mi0yYmFjLTQ5YzgtYTc0MC1mNWVjZjdjMTBkNmEifQ.eyJleHAiOjE2ODY1NzQzOTgsImlhdCI6MTY3ODc5ODM5OCwianRpIjoiYmMwZGZiM2UtNDExZC00NzNhLTlhOWItOTBkMzEwYTI1MWYxIiwiaXNzIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwiYXVkIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwic3ViIjoiZGE4ODAwNDctNTRiMC00MDM3LTkxMTgtY2UwNGJiNWQwYzBhIiwidHlwIjoiT2ZmbGluZSIsImF6cCI6InhzMmEtcmVzdGFwaS1jbGllbnQiLCJub25jZSI6ImM3MWYxNmUyLTUzNGQtNDI0Ni05OGZjLTRkZjlmNDRhYmM2ZiIsInNlc3Npb25fc3RhdGUiOiJlNTk1YWZhOS04OGJkLTQ2NzktYWUyZC03ZTgwYTk0MWE0NDUiLCJzY29wZSI6InhzMmEgb2ZmbGluZV9hY2Nlc3MiLCJzaWQiOiJlNTk1YWZhOS04OGJkLTQ2NzktYWUyZC03ZTgwYTk0MWE0NDUifQ.NbBCxEc5AmdbmIRmo3EGzHE-Eqhwd4NBLiA3nz0BdS4",
    "preferred_used": false,
    "selected_country": "LT",
    "refresh_token_url": "https://xs2a.sb.lt/v1/oauth/token?branch=SB_C",
    "persisted_accounts": {
        "LT127180900466724351": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Banko s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        },
        "LT147180900490724152": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Banko s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        },
        "LT667180900466724349": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Banko s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        }
    },
    "persisted_accounts_key": "6ed1585c-af04-4eb0-9235-c4120a33a4e0"
}


date: Mar 14 12:53:26


date: Mar 14 12:53:29


date: Mar 14 12:53:49


date: Mar 14 12:54:09


date: Mar 14 12:54:26


date: Mar 14 12:54:26


date: Mar 14 12:54:31
a.extra: {
    "client_ip": "84.55.44.247",
    "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJGc1JyMzlneXY3X3dNdzVLT1RqMHg0RXNVZ1BUeVNocjltdC1NNlR2U09rIn0.eyJleHAiOjE2Nzg4MDAyNjYsImlhdCI6MTY3ODc5ODQ2NiwiYXV0aF90aW1lIjoxNjc4Nzk4NDY2LCJqdGkiOiJmY2IzMmYxNC01MzVjLTQyOWQtYjkyNC02ZDFkMmU2MWY4NzgiLCJpc3MiOiJodHRwczovL2tleWNsb2FrLnNiLmx0L2F1dGgvcmVhbG1zL1NCX0MiLCJzdWIiOiJkYTg4MDA0Ny01NGIwLTQwMzctOTExOC1jZTA0YmI1ZDBjMGEiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJ4czJhLXJlc3RhcGktY2xpZW50Iiwibm9uY2UiOiIxNGVhOWRmYy1hNTVmLTRiMzgtOTIyNS01MTIzOTZjMjRmMzkiLCJzZXNzaW9uX3N0YXRlIjoiZTU5NWFmYTktODhiZC00Njc5LWFlMmQtN2U4MGE5NDFhNDQ1IiwiYWNyIjoiMSIsInNjb3BlIjoieHMyYSBvZmZsaW5lX2FjY2VzcyIsInNpZCI6ImU1OTVhZmE5LTg4YmQtNDY3OS1hZTJkLTdlODBhOTQxYTQ0NSIsInByZWZlcnJlZF91c2VybmFtZSI6ImtsMjk4NTgxcCJ9.IypZM73tmtyX7K8O-LKAixsteOva0l2AHHjbEj73xVhMgrJQQFTwxOnGgFD69DfPkvFmq35TYZ1PG5uR_AUsOdz8XNkT3KVTX4GlNhuQEM-MqP6i2LpdHOik1KrNGE6EiQkPyyHoWrCsxnoHHjzUFhCXf6n0p1DxYVrHgME8MF7BMqvWOvgQQh4g96GPSLQBxZ09kti43mgsdszu80V5la4Y43NgMtzusD9wWAug1Pd9Iazaxe9fWIzu5q44zY5vSskE45wMjzPT0iLNRShbDiyG0YDlVvx5e1VxXlcA0zXV66yz-jp5fbfoqsDfs671Hvj66RpiPzL_w-acZN-UFw",
    "referrer_url": null,
    "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJkNGQ5MDU2Mi0yYmFjLTQ5YzgtYTc0MC1mNWVjZjdjMTBkNmEifQ.eyJleHAiOjE2ODY1NzQzOTgsImlhdCI6MTY3ODc5ODQ2NiwianRpIjoiYmFiMWZmODQtMTgyYi00ZjZlLThhZjUtZDdmMjJjMWE0NmYzIiwiaXNzIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwiYXVkIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwic3ViIjoiZGE4ODAwNDctNTRiMC00MDM3LTkxMTgtY2UwNGJiNWQwYzBhIiwidHlwIjoiT2ZmbGluZSIsImF6cCI6InhzMmEtcmVzdGFwaS1jbGllbnQiLCJub25jZSI6IjE0ZWE5ZGZjLWE1NWYtNGIzOC05MjI1LTUxMjM5NmMyNGYzOSIsInNlc3Npb25fc3RhdGUiOiJlNTk1YWZhOS04OGJkLTQ2NzktYWUyZC03ZTgwYTk0MWE0NDUiLCJzY29wZSI6InhzMmEgb2ZmbGluZV9hY2Nlc3MiLCJzaWQiOiJlNTk1YWZhOS04OGJkLTQ2NzktYWUyZC03ZTgwYTk0MWE0NDUifQ.t9vimDxtd1QefOuJ6oCi_8I0xZC6vbZ2wQyzHyB7aCg",
    "preferred_used": false,
    "selected_country": "LT",
    "refresh_token_url": "https://xs2a.sb.lt/v1/oauth/token?branch=SB_C",
    "persisted_accounts": {
        "LT127180900466724351": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Banko s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        },
        "LT147180900490724152": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Banko s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        },
        "LT667180900466724349": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Banko s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        }
    },
    "persisted_accounts_key": "2ed526ba-a3e3-4166-bfd4-0835da8a41af"
}


date: Mar 14 12:54:31


date: Mar 14 12:54:44


date: Mar 14 12:54:44


date: Mar 14 12:54:49
a.extra: {
    "client_ip": "195.182.68.245",
    "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJGc1JyMzlneXY3X3dNdzVLT1RqMHg0RXNVZ1BUeVNocjltdC1NNlR2U09rIn0.eyJleHAiOjE2Nzg4MDAyODQsImlhdCI6MTY3ODc5ODQ4NCwiYXV0aF90aW1lIjoxNjc4Nzk4NDg0LCJqdGkiOiIwN2IxOTgwOS1iODk0LTRkNTgtODczYy1kZDI3M2VmMGI2ZWUiLCJpc3MiOiJodHRwczovL2tleWNsb2FrLnNiLmx0L2F1dGgvcmVhbG1zL1NCX0MiLCJzdWIiOiJhNThiOGEwNi1hZjYwLTQxNWUtODQ0Yi0zODJhY2E5ZDY1OGEiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJ4czJhLXJlc3RhcGktY2xpZW50Iiwibm9uY2UiOiI1MzQzOTE3Mi1lMGU1LTRlNDItOThmMy1iYmM4YmQxZDUzNDkiLCJzZXNzaW9uX3N0YXRlIjoiYWNkYzdmMmEtOWZlYS00YzU5LThiZjYtMzZjMmRlOGViYzY2IiwiYWNyIjoiMSIsInNjb3BlIjoieHMyYSBvZmZsaW5lX2FjY2VzcyIsInNpZCI6ImFjZGM3ZjJhLTlmZWEtNGM1OS04YmY2LTM2YzJkZThlYmM2NiIsInByZWZlcnJlZF91c2VybmFtZSI6ImtsMTY3ODI4cCJ9.HIrGjie7bqb_OCQ1KNQugLOyaU8Nq-AjKp002BBjSpQ_BkbwAhNQdqfVmYc-Zk6MN-Q4yFAA9WV4qmJ2Xz7oD--xIAs3AcOuhnCBowfjqR813oDhfBSxaQumFaNNZQRFnINtc-bL8BKFXegcSiKChik-iE0VLo_XkShier91BmSlqr_9lhX-ihjspt7NKN5LHNGWLbAdeR5BCzyPKKVhH6YcpAPT7LLSB8XT7YRT57hfYIVXx-pBi-ryUTM7H0lxaxuY0LbyXGAlqozxJkoHVKwFJn29oqwI7gnQGA2iOlFNnF4XZaPR8v2YZJDIYonlxSxIGXqc5Nma5ztJmMwPQg",
    "referrer_url": null,
    "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJkNGQ5MDU2Mi0yYmFjLTQ5YzgtYTc0MC1mNWVjZjdjMTBkNmEifQ.eyJleHAiOjE2ODY1NzQ0ODQsImlhdCI6MTY3ODc5ODQ4NCwianRpIjoiNzQxODNjYWItYTJhMS00YmE0LTkxNzgtN2UxOWUxNmVkNTZhIiwiaXNzIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwiYXVkIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwic3ViIjoiYTU4YjhhMDYtYWY2MC00MTVlLTg0NGItMzgyYWNhOWQ2NThhIiwidHlwIjoiT2ZmbGluZSIsImF6cCI6InhzMmEtcmVzdGFwaS1jbGllbnQiLCJub25jZSI6IjUzNDM5MTcyLWUwZTUtNGU0Mi05OGYzLWJiYzhiZDFkNTM0OSIsInNlc3Npb25fc3RhdGUiOiJhY2RjN2YyYS05ZmVhLTRjNTktOGJmNi0zNmMyZGU4ZWJjNjYiLCJzY29wZSI6InhzMmEgb2ZmbGluZV9hY2Nlc3MiLCJzaWQiOiJhY2RjN2YyYS05ZmVhLTRjNTktOGJmNi0zNmMyZGU4ZWJjNjYifQ.f0CpR4hwqa1ZxXtU12iRA7hZOHgWALgk2IBfA1sgB-Y",
    "preferred_used": true,
    "selected_country": "LT",
    "refresh_token_url": "https://xs2a.sb.lt/v1/oauth/token?branch=SB_C",
    "persisted_accounts": {
        "LT747180300227733210": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Mok\u0117jimo kortel\u0117s s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        }
    },
    "persisted_accounts_key": "d4ec566d-1237-4e93-aa09-92f54dc4863b"
}


date: Mar 14 12:54:49


date: Mar 14 12:54:53
a.extra: {
    "client_ip": "195.182.68.245",
    "access_token": "eyJhbGciOiJSUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJGc1JyMzlneXY3X3dNdzVLT1RqMHg0RXNVZ1BUeVNocjltdC1NNlR2U09rIn0.eyJleHAiOjE2Nzg4MDAyODQsImlhdCI6MTY3ODc5ODQ4NCwiYXV0aF90aW1lIjoxNjc4Nzk4NDg0LCJqdGkiOiIwN2IxOTgwOS1iODk0LTRkNTgtODczYy1kZDI3M2VmMGI2ZWUiLCJpc3MiOiJodHRwczovL2tleWNsb2FrLnNiLmx0L2F1dGgvcmVhbG1zL1NCX0MiLCJzdWIiOiJhNThiOGEwNi1hZjYwLTQxNWUtODQ0Yi0zODJhY2E5ZDY1OGEiLCJ0eXAiOiJCZWFyZXIiLCJhenAiOiJ4czJhLXJlc3RhcGktY2xpZW50Iiwibm9uY2UiOiI1MzQzOTE3Mi1lMGU1LTRlNDItOThmMy1iYmM4YmQxZDUzNDkiLCJzZXNzaW9uX3N0YXRlIjoiYWNkYzdmMmEtOWZlYS00YzU5LThiZjYtMzZjMmRlOGViYzY2IiwiYWNyIjoiMSIsInNjb3BlIjoieHMyYSBvZmZsaW5lX2FjY2VzcyIsInNpZCI6ImFjZGM3ZjJhLTlmZWEtNGM1OS04YmY2LTM2YzJkZThlYmM2NiIsInByZWZlcnJlZF91c2VybmFtZSI6ImtsMTY3ODI4cCJ9.HIrGjie7bqb_OCQ1KNQugLOyaU8Nq-AjKp002BBjSpQ_BkbwAhNQdqfVmYc-Zk6MN-Q4yFAA9WV4qmJ2Xz7oD--xIAs3AcOuhnCBowfjqR813oDhfBSxaQumFaNNZQRFnINtc-bL8BKFXegcSiKChik-iE0VLo_XkShier91BmSlqr_9lhX-ihjspt7NKN5LHNGWLbAdeR5BCzyPKKVhH6YcpAPT7LLSB8XT7YRT57hfYIVXx-pBi-ryUTM7H0lxaxuY0LbyXGAlqozxJkoHVKwFJn29oqwI7gnQGA2iOlFNnF4XZaPR8v2YZJDIYonlxSxIGXqc5Nma5ztJmMwPQg",
    "referrer_url": null,
    "refresh_token": "eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJkNGQ5MDU2Mi0yYmFjLTQ5YzgtYTc0MC1mNWVjZjdjMTBkNmEifQ.eyJleHAiOjE2ODY1NzQ0ODQsImlhdCI6MTY3ODc5ODQ4NCwianRpIjoiNzQxODNjYWItYTJhMS00YmE0LTkxNzgtN2UxOWUxNmVkNTZhIiwiaXNzIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwiYXVkIjoiaHR0cHM6Ly9rZXljbG9hay5zYi5sdC9hdXRoL3JlYWxtcy9TQl9DIiwic3ViIjoiYTU4YjhhMDYtYWY2MC00MTVlLTg0NGItMzgyYWNhOWQ2NThhIiwidHlwIjoiT2ZmbGluZSIsImF6cCI6InhzMmEtcmVzdGFwaS1jbGllbnQiLCJub25jZSI6IjUzNDM5MTcyLWUwZTUtNGU0Mi05OGYzLWJiYzhiZDFkNTM0OSIsInNlc3Npb25fc3RhdGUiOiJhY2RjN2YyYS05ZmVhLTRjNTktOGJmNi0zNmMyZGU4ZWJjNjYiLCJzY29wZSI6InhzMmEgb2ZmbGluZV9hY2Nlc3MiLCJzaWQiOiJhY2RjN2YyYS05ZmVhLTRjNTktOGJmNi0zNmMyZGU4ZWJjNjYifQ.f0CpR4hwqa1ZxXtU12iRA7hZOHgWALgk2IBfA1sgB-Y",
    "preferred_used": true,
    "selected_country": "LT",
    "refresh_token_url": "https://xs2a.sb.lt/v1/oauth/token?branch=SB_C",
    "persisted_accounts": {
        "LT747180300227733210": {
            "logo": "/static/images/bank_methods/siauliu_lt_pis.png",
            "name": "Mok\u0117jimo kortel\u0117s s\u0105skaita",
            "account_id": "",
            "currencies": [
                "EUR"
            ],
            "scheme_name": "",
            "account_owner_name": ""
        }
    },
    "persisted_accounts_key": "d4ec566d-1237-4e93-aa09-92f54dc4863b"
}


date: Mar 14 12:54:54


date: Mar 14 12:54:56


Process finished with exit code 0

Alright, let’s expand on this idea and figure out how extract other useful information from the logs.txt.

Let’s extract:

Request URL
Request headers
Request data (payload)
Response status code
Response headers
Response content

Pretty format everything and write it to a new file indicating the time of the parsing in the file name:

import re
import json
import datetime

with open('./logs/logs.txt', 'r') as log_file:
    content = log_file.readlines()

prettified_content = ''

for line in content:

    date = re.search(r'^(\w{3}\s\d{1,2}\s\d{2}:\d{2}:\d{2})', line)

    if date:
        date = date.group(1)

        prettified_content += f'\n\ndate: {date}\n'

    a_extra_match = re.search(r'a\.extra:\s*(\{.*\})', line)

    if a_extra_match:
        a_extra = (
            a_extra_match.group(1)
            .replace("'", '"')
            .replace("None", 'null')
            .replace("False", 'false')
            .replace("True", 'true')
        )
        a_extra_dict = json.loads(a_extra)
        pretty_a_extra = json.dumps(a_extra_dict, indent=4)

        prettified_content += f'a.extra: {pretty_a_extra}\n'

    url_match = re.search(r'request URL: ([^,]+)', line)

    if url_match:
        request_url = url_match.group(1)

        prettified_content += f'request_url: {request_url}\n'

    headers_match = re.search(r'request headers: (\{[^}]+\})', line)

    if headers_match:
        request_headers = (
            headers_match.group(1)
            .replace("'", '"')
            .replace("None", 'null')
            .replace("False", 'false')
            .replace("True", 'true')
        )
        headers_dict = json.loads(request_headers)
        pretty_headers = json.dumps(headers_dict, indent=4)

        prettified_content += f'request_headers: {pretty_headers}\n'

    request_data_match = re.search(
        r'(?<=request data: )(.*?)(?=, response headers)',
        line
    )

    if request_data_match:
        request_data = (
            request_data_match.group(1)
            .replace("'", '"')
            .replace("None", 'null')
            .replace("False", 'false')
            .replace("True", 'true')
        )

        request_data_dict = json.loads(request_data)
        pretty_request_data = json.dumps(request_data_dict, indent=4)

        prettified_content += f'request_data: {pretty_request_data}\n'

    code_match = re.search(r'response status_code: (\d+)', line)

    if code_match:
        response_code = code_match.group(1)

        prettified_content += f'response_code: {response_code}\n'

    res_headers_match = re.search(r'response headers: (\{[^}]+\})', line)

    if res_headers_match:
        res_headers = (
            res_headers_match.group(1)
            .replace("'", '"')
            .replace("None", 'null')
            .replace("False", 'false')
            .replace("True", 'true')
        )
        headers_dict = json.loads(res_headers)
        pretty_headers = json.dumps(headers_dict, indent=4)

        prettified_content += f'response_headers: {pretty_headers}\n'

    res_content_match = re.search(
        r'(?<=response content: b\')(.*?)(?=\' response status_code)',
        line
    )

    if res_content_match:
        res_content = (
            res_content_match.group(1)
            .replace("None", 'null')
            .replace("False", 'false')
            .replace("True", 'true')
            .replace('\\', '')
        )

        res_content_dict = json.loads(res_content)
        pretty_res_content = json.dumps(res_content_dict, indent=4)

        prettified_content += f'response_content: {pretty_res_content}\n'

    prettified_content += (
        '\n\n----------------------------------------------\n\n'
    )

now = datetime.datetime.now()

with open(f'./prettified_logs/logs_{now}.txt', 'w') as f:
    f.write(prettified_content)

And here’s the output file:

[embed-file][text]logs_2023-03-25-174306.554252[/text][link]https://robertsgreibers.sfo3.digitaloceanspaces.com/pythonic.me/files/post/can-i-build-a-log-parsing-tool-in-python/logs_2023-03-25-174306.554252.txt[/link][/embed-file]

How Do You Analyse Logs In Python?

Now that you have parsed the log entries, you can analyze the log data to identify the cause of the unsuccessful request.

In the case of the unsuccessful request in the example log entry, you can see that the response code is 400, and the response content contains an error message stating that the code is not valid.

response_code: 400
response_headers: {
    "Date": "Tue, 14 Mar 2023 12:54:56 GMT",
    "Server": "Apache",
    "Content-Type": "application/json",
    "Connection": "close",
    "Transfer-Encoding": "chunked"
}
response_content: {
    "error": "invalid_grant",
    "error_description": "Code not valid"
}

This indicates that the request data sent in the request is incorrect or something in the payment flow sequence was done incorrectly – depends on the situation.

Fixing the Issue

To fix the issue, you usually need to analyze the request data and identify the cause of the error.

In many cases with payment integrations (that’s what I work with the most) there has been an update in production that we didn’t notice before and we need to deploy a new version of our code.

In this case, we simply would find that part of the code and adjust it according to the new documentation changes.