What Are Regular Expressions In Python?
How Do You Create A Regex Pattern In Python?
How To Extract Pattern From String In Python?
How To Analyze A Log File?
How Can I Use Regex In Python?
Python Regex Code From The Video
# ENTRY #1
# Event: Cdr Privilege: cdr,all AccountCode: Source:491454490 Destination:1545454572877 110 DestinationContext: testing CallerID: Channel: Console/dsp DestinationChannel: LastApplication: Hangup
# LastData: StartTime: 2010-08-23 08:27:21 AnswerTime: 2010-08-23 08:27:21 EndTime: 2010-08-23 08:28:21 Duration: 60 BillableSeconds: 0 Disposition: ANSWERED AMAFlags: DOCUMENTATION UniqueID: 1282570041.3 UserField: Rate: 0.02 Carrier: BS&S
# ENTRY #2
# Event: Cdr Privilege: cdr,all AccountCode: Source: Destination: 110 DestinationContext: testing CallerID: Channel: Console/dsp DestinationChannel: LastApplication: Hangup
# LastData: StartTime: 2010-08-23 08:27:21 AnswerTime: 2010-08-23 08:27:21 EndTime: 2010-08-23 08:27:21 Duration: 0 BillableSeconds: 0 Disposition: ANSWERED AMAFlags: DOCUMENTATION UniqueID: 1282570041.3 UserField: Rate: 0.02 Carrier: BS&S
import re
import json
class Regex:
_source = r'Source:(.+) Destination:'
_destination = r'Destination:(.+) DestinationContext:'
_start = r'LastData: StartTime:(.+) AnswerTime:'
_end = r'EndTime:(.+) Duration:'
def xtract_with_regex(self, line: str, regex: str):
rg = re.compile(regex, re.IGNORECASE | re.DOTALL)
match = rg.search(line)
try:
return match.group(1)
except AttributeError or IndexError:
return
with open('./logs/asterisk.txt', 'r') as file:
not_empty_lines = [
line.replace('\n', '')
for line in file.readlines()
if line.replace('\n', '')
]
events = not_empty_lines[::2]
last_data = not_empty_lines[1::2]
regex = Regex()
entries = []
for event, _last_data in zip(events, last_data):
source = regex.xtract_with_regex(
line=event,
regex=regex._source
)
destination = regex.xtract_with_regex(
line=event,
regex=regex._destination
)
start_time = regex.xtract_with_regex(
line=_last_data,
regex=regex._start
)
end_time = regex.xtract_with_regex(
line=_last_data,
regex=regex._end
)
entries.append({
'source': source,
'destination': destination,
'start_time': start_time,
'end_time': end_time,
})
for entry in entries:
print(' ')
print('Entry:')
print(' ')
print(json.dumps(entry, indent=4))
print(' ')
Mastering Regular Expressions in Python (Notes)
In this blog post, I’ll be sharing my seven-year experience as a Python developer and discussing an essential topic in Python: regular expressions.
Specifically, we’ll delve into regex groups in Python and their usefulness.
The Power of Regular Expressions
Regular expressions are one of the most crucial tools you need to learn as a Python developer, especially if you’re dealing with log files or text file parsing.
They allow you to find specific patterns in a text or log file and extract specific strings.
From my years of experience working in the FinTech field and dealing with various IDs and IBAN numbers, I can assure you that using regular expressions is an industry-standard method of extracting data.
Not only are they practical, but knowing regular expressions in Python can also come up in developer interviews.
So, my recommendation is to pay attention and learn about regular expressions, as they will be beneficial for your career! 🔥
When and How to Use Regex in Python
Now let’s dive into some code.
To start using regular expressions in Python, you need to import the re
module, which is available in Python 3.9 and up.
By importing re
, you’ll have access to the regular expression library, which will enable you to use various regex functions.
Consider a typical scenario where you want to extract specific parts of a text, such as a number within a log file or another text source.
In some cases, you might be able to use simple built-in Python tools like replace()
to extract the desired data.
However, real-life situations can be more complex, and that’s when you would use regular expressions.
How To Create Regex Patterns?
In this section, I’ll demonstrate how to create your own regular expression patterns.
By following my process and steps, you’ll be able to create your own regex patterns easily and quickly.
To create a regex pattern, start by copying the text you want to work with and head to regex101.com. Paste the text as the test string and then create your regex pattern in the provided box.
In regular expressions, you can have groups, which can be created using parentheses.
Inside the parentheses, you’ll use a pattern to capture something.

In this case, you can use a combination of a dot and a plus sign.
The dot matches any character except for line terminators, while the plus sign extends the match. By using these two symbols together, you can create a pattern that captures specific parts of the text.

To match only the desired part of the text, you may need to modify the pattern further.
In this example, we know that the text ends with parentheses.
To match them in the pattern, escape the parentheses using a backslash.
This allows you to match the parentheses in the text without building a capturing group.
Once you’ve completed this step, you should have two matching groups: one for the user agent and one for the number.

Before moving on to using this pattern in Python, I recommend exploring the explanation section on regex101.com to understand capturing groups better.
Knowing how to create and use capturing groups is essential for using regex effectively in Python.
How To Use Your Regex Pattern In Python Code?
In this section, I’ll demonstrate how to use your regex pattern in Python code to extract values from capturing groups.
I’ll show you how to apply this approach in any situation and with any Python code.
import re
regex = 'User-Agent: (.+) \((.+)\)'
text = 'User-Agent: Thunderbird 1.5.0.9 (X11/20061227)'
rg = re.compile(regex)
match = rg.search(text)
groups = match.groups()
group_1 = match.group(1)
group_2 = match.group(2)
print(' ')
print('groups:')
print(groups)
print(' ')
print('group_1: ', group_1)
print('group_2: ', group_2)
print(' ')
To start, compile the regex pattern using the re.compile()
function.
This compiles the regular expression pattern into a regular expression object, which is necessary for using search and match methods.
Next, use the compiled regular expression to search through the text.
This returns a match object, which can be used to access the captured groups.
To access the captured groups, use the match.groups()
method, which returns all captured groups.
If you want to access each group individually, you can use the match.group()
method with the corresponding index.
This minimal knowledge is enough to use regex patterns in Python effectively. 🔥
Of course, when working on larger projects or scripts, you may need to build functions or classes to manage more complex regex use cases.
For more examples of regex use in different situations, refer to the next section in this post. We will discuss a specific, more complicated example sent in by a blog reader.
Using Python to Parse Asterisk Server Log Files
A reader of my blog recently reached out to me, asking for help parsing a log file generated by an Asterisk server.
In software development, it’s common to have log files that need to be read and analyzed, extracting specific patterns or values quickly, without going through the entire file manually.
In this case, our log file is relatively small, but in real-life situations, log files can be massive and difficult to manage without automation.
The challenge with this particular log file is that we need to consider two lines as a single data entry.
To get started, we first need to access the file.
We’ll use Python’s built-in open()
function to read the log file in read mode, like so:
with open('./logs/asterisk.txt', 'r') as file:
not_empty_lines = [
line.replace('\n', '')
for line in file.readlines()
if line.replace('\n', '')
]
Now that we’ve pre-processed our log file, we’re ready to work with the actual data.
The problem we are having now is that the lines are separated, and we want to work with one single entry when extracting values from the log file.
Usually, we would be able to use regex to extract patterns and values from a single line.
However, in this case, we need to work with two lines at the same time.
To achieve this, we can use Python methods zip()
and slices()
.
By combining these methods, we can access both lines in a single for loop
cycle, allowing us to combine the two lines together as one entry.
After pre-processing the log file, we can focus on using regular expressions to extract the desired values.
Coming Up With Regex Pattern
We decided to extract source and destination numbers, start and end times.
To come up with regex patterns for these values, we can use the website Regex101, which allows you to visually see what text is selected by your pattern.

Once you have the regex patterns, you can incorporate them into your Python code.
In this example, we create a simple regex class with a class method called extract_with_regex
.
This method takes a line of string and a regex pattern as input and returns the matching value or a null value if there is no match.
import re
class Regex:
_source = r'Source:(.+) Destination:'
_destination = r'Destination:(.+) DestinationContext:'
_start = r'LastData: StartTime:(.+) AnswerTime:'
_end = r'EndTime:(.+) Duration:'
def xtract_with_regex(self, line: str, regex: str):
rg = re.compile(regex, re.IGNORECASE | re.DOTALL)
match = rg.search(line)
try:
return match.group(1)
except AttributeError or IndexError:
return
To complete the script, we initialize the regex class, create a new list, and loop through the log file, calling the extract_with_regex
method for each specific pattern.
We then build a dictionary with the extracted values and add them to the entries list.
Finally, we print the entries in a formatted way using the JSON library.
with open('./logs/asterisk.txt', 'r') as file:
not_empty_lines = [
line.replace('\n', '')
for line in file.readlines()
if line.replace('\n', '')
]
events = not_empty_lines[::2]
last_data = not_empty_lines[1::2]
regex = Regex()
entries = []
for event, _last_data in zip(events, last_data):
source = regex.xtract_with_regex(
line=event,
regex=regex._source
)
destination = regex.xtract_with_regex(
line=event,
regex=regex._destination
)
start_time = regex.xtract_with_regex(
line=_last_data,
regex=regex._start
)
end_time = regex.xtract_with_regex(
line=_last_data,
regex=regex._end
)
entries.append({
'source': source,
'destination': destination,
'start_time': start_time,
'end_time': end_time,
})
for entry in entries:
print(' ')
print('Entry:')
print(' ')
print(json.dumps(entry, indent=4))
print(' ')
Overall, this post demonstrates how you can use regular expressions in Python to build a parsing script and extract values from log files or similar files.
If you have any questions, feel free to ask in the comment section below.
If you’re looking to become a Python developer and need help building your portfolio project, you can find my contact details below and reach out to me and I’ll see if there’s any way I can help you.
Leave a Reply
You must be logged in to post a comment.