Log File Parsing In Python

Here’s a YOUTUBE VIDEO version of this blog post, check it if you’re looking for a quick solution to build your own Log File Parsing Tool in Python. πŸ‘‡πŸ»

Log File Parsing In Python Video

In this tutorial, you will learn how to open a log file, read a log file, and create a log file parser in Python, essentially building a so-called “Python log reader”.

To open a log file in Python, read a log file, and actually parse a log file or any type of text file in order to extract specific information is not that hard if you know a bit of Python log file parsing and regex pattern match usage.

Python itself is a perfect tool to open a log file for parsing and it does not require any third-party modules. Believe me, the first thing I did was a Google search for a “Python log reader” and “Python file parsing” a couple of years ago when I first started to work on parsing text files in Python. Ever since those days, I’ve learned to work with Python and regex very efficiently, see more details in a recent post I did about how to parse XML SOAP response with Python and regex by clicking here.

In my day job, a while back I was working on testing Skype for Business iOS application as a test engineer and it came to the point where I had to open and manually collect the SfB iOS application log files in order to see all HTTP requests and received HTTP responses.

Ever since then I’ve switched to backend development with Python/Django and also helped people to go into a similar path. See a cut from a recent coaching call here. And more about client testimonials here.

Anyways, in this specific situation, I had to figure out a good way to open iOS log files and parse them to search a log file for properties like:

<code><property name="saveMessagingHistory">Enabled</property></code>

Usually, properties were buried under a bunch of other not-so-important log file dumps, for example:

INFO UTILITIES /Volumes/ServerHD2/buildagent/workspace/200615/client_ios_sfb/dependencies/client-shared_framework_sfbplatform/src/dev/lyncMobile/platform/tracing/privateIos/CMTrace.mm/173:Version Information 6.12.0.65
2016-12-20 13:16:52.303 SfB[417:1af74bc40] INFO UTILITIES CTimer.cpp:657 TimerMap is created 
2016-12-20 13:16:52.342 SfB[417:1af74bc40] INFO UI SFBAppDelegate.mm:69 Application will finish launching with options
2016-12-20 13:16:52.345 SfB[417:1af74bc40] INFO UTILITIES CStorageManager.mm:146 Creating StorageManager
2016-12-20 13:16:52.347 SfB[417:1af74bc40] INFO UTILITIES CStorageManager.mm:187 Initializing StorageManager
2016-12-20 13:16:52.357 SfB[417:1af74bc40] INFO APPLICATION CApplication.cpp:3400 Initialize Internal Begin
2016-12-20 13:16:52.361 SfB[417:1af74bc40] INFO UTILITIES COsInformation.mm:398 User UI language identifier en was mapped to en-US 1033
2016-12-20 13:16:52.362 SfB[417:1af74bc40] INFO UTILITIES COsInformation.mm:106 Device Version Info - Model=iPhone, HardwareModel=iPhone9,3, SystemName=iOS, SystemVersion=10.1
2016-12-20 13:16:52.362 SfB[417:1af74bc40] VERBOSE APPLICATION CApplication.cpp:3415 Initialize Internal -- App State Query Established
2016-12-20 13:16:52.363 SfB[417:1af74bc40] INFO UTILITIES CNetworkMonitor.cpp:70 Successfully started listening to network events
2016-12-20 13:16:52.364 SfB[417:1af74bc40] INFO UTILITIES CNetworkMonitor.cpp:229 Reachabilility Flags IsWWAN(0):Reachable(1):TransientConnection(0):ConnectionRequired(0):ConnectionOnTraffic(0):InterventionRequired(0):ConnectionOnDemand(0):IsLocalAddress(0):IsDirect(0)
2016-12-20 13:16:52.364 SfB[417:1af74bc40] INFO UTILITIES CNetworkMonitor.cpp:198 Updated networkAvailableToConnect(NoNetwork) -> WiFi, isInAirplaneMode(0) -> 0
2016-12-20 13:16:52.364 SfB[417:1af74bc40] INFO UTILITIES CTimer.cpp:227 Created timer instance (0x70286428) for runloop (0x7017e780)

While I was searching for specific properties in those log files, I realized it’s going to be really time-consuming to go through everything manually.

In order to save time, I had to come up with a good way to use Python file parsing, and long behold I managed to write code for a log file parsing Python script.

It’s a very simple way of searching a log file with Python.

If you want to test the following Python log parsing script with a similar text file, you have to download Skype for Business iOS log file here:

Python Log File Open

First of all, we have to figure out the right Python log file open method to use in order to open a log file, read each line and look for specific text in that line using regex (regular expression). It might be confusing and a bit scary to pick up regular expression pattern matching, but believe me it’s not that complicated.

Usually, I like to go to RegExr site and just play around with different expressions until I find a pattern that matches the string I wanted to match.

If that’s too hard for you – try googling for more regex examples, that’s what I did when I first started using regular expressions to write Python log parsing scripts for searching a log file.

with open you’re opening log_file_path file with read-only mode 'r' and assigning text data to a file variable. for line in a file gives you access to each line in an opened log file.

log_file_path = 'path/to/sfbios_log.txt' # Change to log file path on your computer

with open(log_file_path, 'r') as file:
    
    for line in file:
        print(line)

You can add print(line) under that for loop and run the file parsing script, it will print each line of the text file. But in order to match only specific text, you have to use one more for loop.

Searching A Log File With Python

for match in re.finditer(regex, line, re.S) is searching a log file by matching a specific regex pattern in each line and then the corresponding text is assigned to match a variable as an object.

import re

log_file_path = 'path/to/sfbios_log.txt' # Change to log file path on your computer
regex = '(<property name="(.*?)">(.*?)<\/property>)'

match_list = []

with open(log_file_path, 'r') as file:
    
    for line in file:
        for match in re.finditer(regex, line, re.S):
            match_text = match.group()
            match_list.append(match_text)
            
            print(match_text)

In order to know you have found and matched the text you wanted to search for, you can use match.group() which will group all regex groups.

You can change match.group() to match.group(2) in order to print the second regex group. Groups in a regex are organized using ( ).

In this way of searching a log file with Python, you can extract certain parts of the whole log file text and it gives you more file parsing flexibility.

In the end, you’re adding matched text to the match_list in order to use these matched strings later in the “searching a log file with Python” script code which is usually what you want to do when working on log file parsing in Python.

Log Parsing In Python (The Whole Text File)

Sometimes you just need to come up with a way of searching a whole log file with Python, meaning you want to parse more than one line at a time.

Essentially you just have to know a way to read a whole log file with Python and apply regular expression (regex) patterns to the whole text. Using regex is the only secret you need to know for Python file parsing – It’s literally THE BEST way of doing log file parsing in Python.

Code you might be already familiar with: for match in line is not going to work. Now you’re coming closer to the situation where you have to think:

“How can you be parsing text files in Python?”

Just a quick notice here – the above question can also be turned into a good “log file parsing in Python” exercise.

One of my clients had to deal with a very similar situation – she had to manually search through a bunch of log files on a weekly basis.

Eventually, we decided it would be smarter to just build a log file reader – for parsing in Python and save her time.

Python Mentoring Student Results For Log File Parsing In Python

Let’s get back to the code for log file parsing in Python…

In order to read a block of content from a log file, you need to assign the whole log file’s data to a variable as in the example below with data = f.read().

Also read_line variable is introduced which lets you decide which type of log file parsing in Python you want to use. If a value is set to True script is going to parse by line, in any other case by reading the whole file.

import re

log_file_path = 'path/to/sfbios_log.txt' # Change to log file path on your computer
regex = '(<property name="(.*?)">(.*?)<\/property>)'
read_line = True

with open(log_file_path, 'r') as file:
    match_list = []

    if read_line == True:
        for line in file:
            for match in re.finditer(regex, line, re.S):
                match_text = match.group()
                match_list.append(match_text)

                print(match_text)
    else:
        data = file.read()

        for match in re.finditer(regex, data, re.S):
            match_text = match.group()
            match_list.append(match_text)

file.close()

How To Parse Log Files And Save The Results

In order to save the log parsing results we have to use with open(export_file, 'w+') as file again, only this time we’re using 'w+' which means that we’re allowed to edit the text file.

For the export_file name, I found that it’s very nice to use time_now into it, just because this way you don’t have to worry about the text file name and it’s easy to manage it in one folder.

Remove Result Duplicates Of Log File Parsing In Python

Usually, when you work with long log files you might have to deal with duplicated results, especially, if you are looking for a specific text you want to extract with parsing in Python.

An easy way to avoid multiple lines of the same text is to use list(set(match_list)).

In the end, we’re using a simple for loop which iterates through match_list_clean in a range from 0 to length of the match_list_clean.

It prints each item in the list and then writes it in the export file with file.write(match_list_clean[item] + '\n').

import re
import time
from time import strftime

log_file_path = 'path/to/sfbios_log.txt' # Change to log file path on your computer
export_file_path = 'path/to/export/folder' # Change to folder path where you want to export parser results

time_now = str(
    strftime(
        '%Y-%m-%d %H-%M-%S',
        time.localtime()
    )
)
export_file = f'{export_file_path}/parser_output_{time_now}.txt'

regex = '(<property name="(.*?)">(.*?)<\/property>)'
read_line = False

with open(log_file_path, 'r') as file:
    match_list = []

    if read_line == True:
        for line in file:
            for match in re.finditer(regex, line, re.S):
                match_text = match.group()
                match_list.append(match_text)

                print(match_text)

    else:
        data = file.read()
        for match in re.finditer(regex, data, re.S):
            match_text = match.group()
            match_list.append(match_text)

file.close()

with open(export_file, 'w+') as file:
    file.write('EXPORTED DATA:\n')

    match_list_clean = list(set(match_list))

    for item in range(0, len(match_list_clean)):
        print(match_list_clean[item])

        file.write(match_list_clean[item] + '\n')

file.close()

Turn block of code into a function

In the end, you should turn a block of code into a function with variables. This is the smart, modular way of writing code and you should train yourself to think ahead – divide code into functions. Try to avoid keeping everything in a large chunk, rather split code into functions.

Now that you have added main() and parseData() functions, you’re able to use this script anywhere, you can change variables, for example, use different regex types and so on.

import re
import time
from time import strftime


def main():
    log_file_path = 'path/to/sfbios_log.txt'  # Change to log file path on your computer
    export_file_path = 'path/to/export/folder'  # Change to folder path where you want to export parser results

    time_now = str(
        strftime(
            '%Y-%m-%d %H-%M-%S',
            time.localtime()
        )
    )

    export_file = f'{export_file_path}/parser_output_{time_now}.txt'
    regex = '(<property name="(.*?)">(.*?)<\/property>)'

    parseData(log_file_path, export_file, regex, read_line=True)


def parseData(log_file_path, export_file, regex, read_line=True):
    with open(log_file_path, 'r') as file:
        match_list = []

        if read_line == True:
            for line in file:
                for match in re.finditer(regex, line, re.S):
                    match_text = match.group()
                    match_list.append(match_text)

                    print(match_text)

        else:
            data = file.read()
            for match in re.finditer(regex, data, re.S):
                match_text = match.group()
                match_list.append(match_text)

    file.close()

    with open(export_file, 'w+') as file:
        file.write('EXPORTED DATA:\n')

        match_list_clean = list(set(match_list))

        for item in range(0, len(match_list_clean)):
            print(match_list_clean[item])

            file.write(match_list_clean[item] + '\n')

    file.close()


if __name__ == '__main__':
    main()

Match regex into already parsed data

Just to have more options, we can include a reparseData function in the middle of parseData.

For example, in this case, my goal was to see only those log file properties which have a value set to Enabled. Also, another argument reparse=True with a default value is added to the parseData() function in order to be able to control re-parsing.

reparseData function is basically the same code, it’s just that we have to take in data from a list and as far as I know re.finditer can’t handle lists. That’s why we’re using data_string = ''.join(parsed_data) which is taking list items and joining them into one string variable.

def main():
    log_file_path = 'path/to/sfbios_log.txt'  # Change to log file path on your computer
    export_file_path = 'path/to/export/folder'  # Change to folder path where you want to export parser results

    time_now = str(
        strftime(
            '%Y-%m-%d %H-%M-%S',
            time.localtime()
        )
    )

    export_file = f'{export_file_path}/parser_output_{time_now}.txt'
    regex = '(<property name="(.*?)">(.*?)<\/property>)'

    parseData(log_file_path, export_file, regex, read_line=True)


def parseData(
    log_file_path, 
    export_file, 
    regex, 
    read_line=True,
    reparse=False
):
    with open(log_file_path, 'r') as file:
        match_list = []

        if read_line == True:
            for line in file:
                for match in re.finditer(regex, line, re.S):
                    match_text = match.group()
                    match_list.append(match_text)

                    print(match_text)

        else:
            data = file.read()
            for match in re.finditer(regex, data, re.S):
                match_text = match.group()
                match_list.append(match_text)

    file.close()

    if reparse == True:
        match_list = reparseData(
            match_list,
            '(property name="(.{1,50})">(Enabled)<\/property>)'
        )

    with open(export_file, 'w+') as file:
        file.write('EXPORTED DATA:\n')
        match_list_clean = list(set(match_list))
        
        for item in range(0, len(match_list_clean)):
            print(match_list_clean[item])
            file.write(match_list_clean[item] + '\n')
            
    file.close()
    return match_list_clean


def reparseData(parsed_data, regex):
    data_string = ''.join(parsed_data)
    match_list = []
    
    for match in re.finditer(regex, data_string, re.S):
        match_text = match.group()
        match_list.append(match_text)
        
    return match_list

if __name__ == '__main__':
    main()

Output example:

EXPORTED DATA:
property name="saveCredentials">Enabled</property>
property name="messageArchiving">Enabled</property>
property name="conversationLogsNotifications">Enabled</property>
property name="customerExperienceImprovementProgram">Enabled</property>
property name="multiViewJoin">Enabled</property>
property name="logging">Enabled</property>
property name="saveMessagingHistory">Enabled</property>
property name="video">Enabled</property>
property name="unansweredCallHandling">Enabled</property>
property name="callLogArchiving">Enabled</property>
property name="conversationHistory">Enabled</property>
property name="allowDeviceContactsSync">Enabled</property>
property name="photos">Enabled</property>
property name="saveCallLogs">Enabled</property>
property name="clientExchangeConnectivity">Enabled</property>

I'll help you become a Python developer!

If you're interested in learning Python and getting a job as a Python developer, send me an email to roberts.greibers@gmail.com and I'll see if I can help you.

Roberts Greibers

Roberts Greibers

I help engineers to become backend Python/Django developers so they can increase their income

13 Comments

  1. nicely presented! thanks!

  2. very well presented! thank you πŸ™‚

  3. hi..not able to get the log file…mine to share..Thanks

  4. Good article, thank you for this.

  5. What is f.read() in your first line 16?

  6. Very Helpful doc…

  7. it shows error when i try to export the filtered log file to text

  8. hamzabarkallah12@gmail.com

    May 25, 2021 at 1:02 pm

    I got this error:

    for item in xrange(0, len(match_list_clean)):
    NameError: name ‘xrange’ is not defined

1 Pingback

  1. Pingback: Log File Parsing In Python (Video)

Leave a Reply