Log File Parsing In Python Video (Part 1)
Log File Parsing In Python Video (Part 2)
Python Log Parser Code From The Video
import re
import typing
with open('./logs/sfbios_log.txt', 'r') as f:
log_file_lines = f.readlines()
def lookup_pattern(
pattern: str,
lines: typing.List[str]
) -> typing.List[str]:
match_list = []
for line in lines:
for match in re.finditer(pattern, line, re.S):
match_text = match.group()
match_list.append(match_text)
return list(set(match_list))
properties: typing.List[str] = (
lookup_pattern(
pattern='(<property name="(.*?)">(.*?)<\/property>)',
lines=log_file_lines
)
)
enabled_properties: typing.List[str] = (
lookup_pattern(
pattern='(<property name="(.*?)">Enabled<\/property>)',
lines=properties
)
)
with open('./logs/results.txt', 'w') as f:
for prop in enabled_properties:
f.write(prop + '\n')
Log File Parsing Script Notes From The Video
All right, let’s talk about log file parsing.
So, this is a very common situation that you have to deal with if you are working as an engineer or if you’re working as a developer, like a Python developer.
And this is a very common situation where you could use Python to automate things. Back in the day, I used to work as a QA engineer and I had to check iOS application log files.
And I had to do that manually, sometimes on a daily basis, sometimes on a weekly basis.
Skype For Business (Log File Example)
But the point is that if you have a large log file, let’s take for example this one.
This one has many, many lines like 30,000 or so.
And if you have to go through this manually by hand looking for some specific information, it just becomes frustrating.
And the first thing that comes to your mind is, how can I automate this?
How could I automate this?
And back in the day, this is like many years ago when I used to work as a QA engineer, I also had this problem and I figured out that I could write a Python script to parse these log files automatically.
A simple Python script can solve this problem and save you time.
So, this video post will be about me explaining how you could use Python to write a simple log file parsing script that could help you optimize your work and save you some time.
Log File Parsing Resources For Python
I also have a blog post explaining more details about the things that I’m going to show you in the video. I’m going to link to the post here.
🚨 And also, before we continue, if you want to become a Python developer but you’re struggling with things and trying to find out where to start, find my contact details below this post!
I’ll look at your situation and give you some tips on where to start, that’s how one of my students – Yuliia got started and got her first software development job! I’ll link to Yuliia’s story here.
How To Parse Logs?
So, what I would do if I want to extract some logs from this file, how do I start my Python script?
Well, you always start with some kind of environment.
So, I just want to check if my code is running and if I can actually execute some kind of Python script.
I can.
What is the next step when you look at the log files?
The first thing you should think about is, well, how can I read this entire log file from Python?
How To Read A Log File In Python?
So the first thing that you do is go down the file and, by the way, you can do it by just typing open()
and the path to the file. (details in the video)
So, I have this log parsing folder and then I have a logs folder.
So, when I want to use open()
and open the log file, I use “logs” and the name of the file, which is sfb_ios_log.txt
.
Choosing The Read Mode In Python Open()
And the second thing here is you’re choosing the mode in which you’re going to open the file.
Now, I want to use reading mode.
So, read mode is for 'r'
.
Writing would be 'w'
.
Then, what I want to do is read all the lines of the log file.
And let’s name this variable lines
.
It would be like this: lines = log_file.readlines()
.
And basically, this should read the lines of the file.
But the next thing you want to do is you want to see if this actually works.
So, how am I going to check this?
I’m going to do it by typing the following:
for line in lines:
print(line)
breakpoint()
breakpoint()
lets you stop at the specific line in code and debug it.
Here what I wanted to do is to stop at the first cycle of the for loop and check if the first line of the log file is actually printed out.
So, I guess the first line print worked. (looking back at the video)
All right, I understand that this code can read lines and I can go through each line in the file.
Next thing that you should think about or figure out how to do is how can you look for some kind of patterns in this file.
Looking For Patterns In A Log File
When I was working on log file parsing script back then I needed to find… I didn’t care about all the text in the log file.
I needed to find specific properties only.
I was looking for properties and then deciding what to do based on those properties.
And if check the log file, you’ll see that these properties are a bit messy…
It’s not really clear – human-readable text – and that’s how usually log files are.
You can’t really clearly tell what’s going on…
It’s hard to read such log files…
But in Python, you can write a script that summarizes and improves readability.
Just to have something in here to see as an example, this is what I’m looking for.
<code><property name="saveMessagingHistory">Enabled</property></code>
These are the properties that I’m looking for to extract from the file.
So now you should think the same way.
If you’re working with some kind of log files, you should be thinking the following:
What am I looking for?
What type of patterns I’m looking for?
Analyzing Property Pattern
And when you look at the following lines, you should be able to see that there’s a pattern to recognize them.
<property name="culture">en-US</property>
<property name="endpointId">Ucmp:EE7806F8-F335-498D-89F4-B2E5977C17AE;IPhoneLync;cb57bfeb-ea56-44b1-bc58-a444598c95e1</property>
<property name="type">Phone</property>
<property name="userAgent">iPhoneLync/6.12.65.0000 (iPhone iOS 10.1)</property>
Property name, if you search for the “property name” in the log file, there are many lines containing properties..
You should be able to see, okay, they’re pretty similar if I remove certain parts of text.
They’re not the same, but they are similar.
And you can kind of see the pattern in them.
You can see it’s a property tag.
It has a name attribute.
The only thing that changes here is the name attribute value.
And the value itself for the property tag.
<property name="...">xxx</property>
<property name="...">xxx</property>
<property name="...">xxx</property>
<property name="...">xxx</property>
And this is a sort of pattern – similar patterns you should be able can recognize in any log file.
Now you need to figure out how in Python you could extract such patterns.
Match Patterns In A Log File With Python
And when it comes to that, I’m usually using regular expressions.
I’m going to give you something that you can start with.
But generally speaking, you want to do “something” with the line to extract information from the line.
Or check if there is some specific information in the line.
To do that, you need to import regular expressions.
This is how you import regular expressions in Python.
import re
By the way, this is code that I’m taking from my blog post.
I’m going to link it here if you want to see some more details in writing.
In the blog post, you’ll see details of what all this means that I’m doing in the video.
But now I’m just going to show you how to use my pattern and how you can come up with your own patterns.
regex = '(<property name="(.*?)">(.*?)<\/property>)'
How To Come Up With Regex Pattern?
You would probably ask – How did you come up with this pattern?
And how can I come up with this pattern on my own?
I know from my own experience, when I’m working with regular expressions it’s sometimes hard to come up with these patterns if you don’t know exactly the meaning behind them.
So what I’m usually using is this website, regex101.com, which lets you come up with a regular expression and you can visually see what is actually matched with your regular expression.
I’ve shared more of my thought process when using regex101.com in this blog post.
This type of pattern (with regex groups (.*?)
) is something that I’m using pretty often and it can be used in a lot of similar situations.
(<property name="(.*?)">(.*?)<\/property>)
So these, as you can see, are capturing groups. -> (.*?)
And what I’m doing is I’m capturing different groups of text with a regular expression.
The problem I’m trying to solve is – I want to find all the properties.
I don’t really care what is the name tag when I’m looking for properties.
And I don’t really care what’s the name attribute..
I just want to find all of them.
And this type of regex pattern lets you do exactly that.
It just selects everything between these quotes.
Any string no matter how long it is, within the quotes will be selected.
And the same goes for finding the property value.
Also, I don’t care what’s the length of the string.
I just want to find these properties.
And it doesn’t matter what type of value is between the quotes.
Collecting Matched Properties
If you collect all the matched property values, what you will end up with is a list of matched property strings.
<property name="saveMessagingHistory">Enabled</property>
<property name="a1b28296-4796-4ad9-aab2-01aa52f078ef">please pass this in a PUT request</property>
<property name="audioPreference">VoipAudio</property>
<property name="conversationHistory">Enabled</property>
<property name="etag">2002123654</property>
<property name="phoneNumber">tel:+1555</property>
<property name="publishEndpointLocation">True</property>
<property name="simultaneousRingNumberMatch">Disabled</property>
In the video, I explain how you would want to collect all of them into a variable called match_list
to be able to print them all out at the end of the script.
So, in the end, you could look at the script and say:
This code goes through each line in the file and looks for a specific pattern.
And if we have a match, we just add the matched text to match_list
, and at the end, we print all the matches.
Instead of having to look through 30,000 – 40,000 lines of text, I can just look through properties printed after running the Python script (shared in videos and at the beginning of the post).
Write Parsing Results To A Text File
Let’s say you don’t want to run a Python script every time you want to look at the collected properties…
What you could do is just write the results to a new text file at the end of the script:
with open('logs/results.txt', 'w') as results_file:
for item in match_list:
results_file.write(item + '\n')
Now you’re changing the mode from reading to writing, notice 'w'
In the first part of the log parsing script we used 'r'
mode.
And basically what you would want to do is just go through the match list and write all the match items to the results.txt
file.
Also notice, you need to add new line ( \n
) at the end of each item, otherwise, the result is just going to be smashed into a single line.
Run it one more time.
You should be able to see that the code created a new results file.
And at the end, you have the results file with all the properties.
Now you can analyze these properties and find what you’re looking for more easily.
Instead of having to look through 40,000 lines, you have a few thousand, but there’s still something a bit wrong…
Remove Duplicates In Python
If you’d just use regular expression to find a specific patter, run the script and it turns out there are duplicated lines in the text file, you’d end up with a bunch of duplicated results too.
In many cases, you’d want to just see the results one time, not 10 or 20 times (of course, depends on your situation)
So how could you remove duplicates from the results.txt
?
In Python, in order to remove duplicated results from a list, you can apply the following code:
list(set(match_list))
Re-run your Python log parsing script after applying list() and set() methods to the list of results and I guarantee you’ll have reduced the number of lines you have to read…
In the video, you can clearly see I reduced mine from several thousand to just 300 lines of property items.
Match Only Enabled Properties
Let’s say, you’d want to extract only Enabled properties.
How would you do that?
Here’s a screenshot from the video code..
See, I just slightly modified the regular expression pattern and used properties from before..
Now I only have enabled properties.
Perfect.
I reduced this entire log file from 40,000 lines to just 15 lines.
It’s so easy to analyze now.
This is a great example of how Python can be very helpful in reducing manual work.
This is how you can reduce manual work and have multiple patterns applied multiple times to the same log file or already extracted results..
Alright, that’s pretty much it.
⚠️ If you got value out of this video post, like the video on YouTube and subscribe to the channel..
You can check out my blog post where I shared more details of the same log file parsing process…
Leave a Reply
You must be logged in to post a comment.