Matthew Egan

Your Agents Need You to Be a Product Engineer

2026-03-23T00:00:00Z

Over the last few years I've seen coding agents develop rapidly and the techniques of using them change just as much. Engineers that are getting the most out of these tools are the ones that have invested in broadening their skillsets outside of traditional engineering responsibilities.

As coding agents have increased the rate at which an engineer can produce the "coding" part of their role, it's becoming more important to keep your agent unblocked to allow work to continue, similar to how a manager aims to keep their employees unblocked to allow projects to continue. This problem now becomes more frequent in a large organisation where all engineers are AI-enabled and running agents in this fashion. This is because previously, these engineers would often only need to work on engineering related tasks such as developing architecture, writing code, deploying infrastructure etc. while other roles such as business analysts, product managers, designers and others would have relevant work routed to them. A product manager would handle the interface with the customer and understand what we wanted to build and what results it should deliver, a designer would help bring this to life and focus on the look (UI) and feel (UX) of the solution, and the engineer would ultimately build it.

This is all very black-and-white and of course in practice the roles overlap here and there however as the code production process has become faster, these routed responsibilities become a clear bottleneck. You're now spending considerable time at these early stages of problem solving before a quick build stage that validates all of that work.

The cost of a blocked agent

So imagine you're working with an agent to develop software quicker than you ever have before, you've entered a flow state, defining schemas and data flows, making technical tradeoffs and designing architecture BUT THEN the agent asks you about a new button in the design, it's not part of our design system and you don't know what it's meant to do, and there's no notes in the design. At this point, you're ripped out of your flow-state and need to look through meeting notes, any other docs you have, and maybe message a few people or even have a meeting ... that's tomorrow.

This scenario is simplistic but I hope demonstrates the point, this type of blocker existed before agentic coding but the cost of it is now significantly higher relative to your total output. For example, say a project unassisted by agentic coding were to take 100 hours, but with agents takes 50 hours, a blocker that takes 5 hours to unblock would have previously impacted the project by 5% but now it's a 10% impact on the project's total time to delivery.

What happens when the means (coding) to produce the output (a coded product) speeds up substantially without the inputs (product requirements and designs) also speeding up? Well, you get a surplus in capacity that is not being used which results in subpar projects getting put on the conveyor belt if you want more output, or you get reduced team sizes if you want the same output as before. I'd argue that the best businesses would use this newfound capacity to produce more rather than just cutting costs and reducing headcount.

Fix the blockage

And this is why I think that becoming a well rounded product-minded engineer is important for any engineer that works on customer facing software built by product teams. This isn't new, often T-shaped individuals have been touted as great hires and you want them on your team to get the best results. These people are often reasonably senior, have seen a lot and may have even worked in other non-engineering roles before. Previously, you could become one of these people by learning about the other roles in your team, becoming more experienced over the course of your career and aiming to work your way up career ladders, gaining more breadth as you went. However now, I believe that you need to start becoming this individual earlier in your career as businesses will prioritise those that can work like this rather than only do a traditional software engineer's role. I'm not claiming to have the answer to gaining such broad knowledge early in your career but rather I can see that these skills are in demand more than pure coding skills, so there's no better time than now to learn those skills.

Outcomes, not tickets

So what does it look like day-to-day? Well perhaps it makes sense to look at the classic engineer expectation. In a traditional role where engineers are primarily focused on coding (and code-related tasks like architecture, code review etc.) you may come into work, go to your sprint plan or standup, pick up some tickets and then get into the day working on them. An endpoint added here, a frontend route added there, perhaps a bug fix or two. Now throw that away, an AI-assisted product engineer just did all of that before their first meeting. They understood that a particular feature needed to be built, had a rough understanding of the architecture and what was involved and spent most of the development time getting their plan right, probably debating technical decisions with their agents and filling in context around who the customer is, how the feature should work etc. on the fly without needing to step away to get these answers. They aren't focusing on a ticket, they are focusing on the product outcome that the team is looking to deliver.

I'm not advocating for cowboy engineering but rather getting to the quality gate faster. These engineers aren't shipping this work immediately but instead they're presenting it as a demo internally or to users, gaining feedback from PMs, designers and others, and then re-iterating, they're just doing this more holistically than brick-by-brick.

Build the taste you're missing

How do you get to this point? It's not a prerequisite to have god-level AI-skills to be a T-shaped product engineer however the fact that there are people at that level makes it that much more urgent that you become a product engineer. So beyond the AI-skills, you need to focus on the overlaps of your traditional engineering role and the other roles on your team, what questions are your agents getting stuck on that you can't answer? Any of those questions, you should be able to answer so spend time working on what skills you need to actually answer those well.

These may be questions related to user flows, or perhaps the look of things, or domain-specific knowledge that your customer would know. Spend time understanding the work that your colleagues are doing, it's definitely not just whatever they hand you to start building but usually a lot of research, discussion and deep thinking on particular problems. Get closer to your customer, ask to join calls and interviews, dogfood your product if you can, read the brief that was given to your PMs and designers. Ask "why" on every decision that you still need input from others on to understand their reasoning over time. Make calls on low risk decisions, ship your work and get review, learn from these corrections. Study the best products that you personally use, what brings you joy from these products and how can you use similar approaches in the products you work on?

If you're going to take away anything from this post, it's this:

If you work on a product team developing software, becoming more product-minded is going to help you more than ever in the age of AI. The more breadth you have, the more effective you will be.
Start using agents and notice where you stall progress by needing to loop in others, invest in your own growth at these points.
Stop thinking in tickets and start thinking in outcomes.

On Finishing University — Reflections on completing a university degree and what comes next
Racing against time at DiviPay — How we improved payment approval performance at DiviPay

Racing against time at DiviPay

2020-11-18T00:00:00Z

For the last couple years I have been working at DiviPay on building the next generation of expense management software covering everything from expense reimbursement for employees to controlling your budgets and subscription payments to virtual cards. It has been a wild ride so far and has presented many interesting challenges for myself and the team. In this post I will be explaining how we faced into an interesting performance problem and managed to not only solve it but exceed our expectations.

Payment Cards

Payment cards are a wide-spread payment method around the world, you probably have one in your wallet right now in the form of a credit or debit card from your bank. In my role I am often working with virtual payment cards which are a payment method that is very similar to a physical credit/debit card but is only available online. These payment cards present some interesting challenges for an engineering team when viewed from both a security and performance perspective.

Security and Correctness

Traditionally, a payment card is attached to some kind of bank account and besides some fraud rules and restricted merchants the logic for allowing a payment is to check whether there is enough funds available in the account or on the line of credit before approving the payment to go through. This does not need to be the case and in DiviPay's case, we actually build on top of this logic to allow the cards to respect multiple different limits or rules at one time. This allows us do some interesting things such as having different card numbers per subscription with different maximum amounts but only maintaining one account to fund all the subscriptions. As a trade-off, this means we need to process more logic at the time of payment to achieve these features and make sure each card only approves payments it is supposed to accept.

Performance

As mentioned above, to build these more complex approval features we need to put more logic between the payment being requested and the approval of the payment. To understand the importance of this we must first understand the four-party card scheme.

The four-party scheme is a type of payment network used to allow card holders to make payments to various merchants. This is the way schemes like Mastercard and Visa work.

In the four-party scheme, there are four different parties (surprising right?):

Cardholders: This is the person or entity that has is trying to make a payment with a card (physical or virtual).
Issuer: This is the entity or group of entities that issues the cards to the card holders.
Acquirer: This is the entity, often a bank, that receives and stores the money from the payment.
Merchant: This is the merchant that the cardholder is trying to pay.

The benefit of using a four-party scheme is that the merchant and cardholder don't need to figure out how to exchange funds between each other and can leave the acquirer and issuer to figure out the specifics.

As a payment is made by a cardholder and flows through the scheme from the merchant, through the acquirer and to the issuer and back it may face various checks to decide whether the payment should be successful. At the high level, the entity running the scheme (read Mastercard/Visa) often enforces a maximum threshold for how long this process can take and then each entity in the process must respond within some threshold below this maximum.

In DiviPays case we have a maximum of 3 seconds to respond with whether or not a payment should be accepted or not. Now this seems like a lot but you need to take into account that this is measured by the entity upstream from us so includes things like network latency which can have a significant impact on the total time taken to respond.

Setting The Bar

We had recently implemented some metrics around our payment approval flows to keep an eye on the percentage of failed payments and we noticed that we were only hitting around a 93% success rate including payments that we had responded to in time but hadn't reached the upstream issuer in time. To make sure that payment requests are resolved and responded to within 3 seconds we needed to set a target for some time that was realistic but effective, taking into account the network latency that was also contributing to the slowness. In the 7% of payments that failed, we noticed that a large number of them had actually been responded to in our systems in less than 1.5 seconds which meant that there was significant delay in the network every now and then. At lot of this was out of our control however we could control the 1.5 seconds that the payment was in our system. We decided to aim for 2 SLOs being a p99 of payments being under 1.5 seconds and a p95 of under 800ms which would give ample time for the payment to get back through the network.

Finding The Low-hanging Fruit

Having not performance optimized our payment approval at this point there was probably some pretty large changes we could make in the flow to reduce the time it took to process each payment. To find this we did a group code review to find obvious code smells such as querying the database multiple times for the same information or information that wasn't needed. Once we had completed this process we began profiling the the flows in our beta environments to find any issues we hadn't caught in code review. As we use Django we were able to do this with some simple middleware like this:

class ProfileMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        pr = cProfile.Profile()
        pr.enable()
        response = self.get_response(request)
        pr.disable()

        s = io.StringIO()
        sortby = "cumulative"
        ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
        ps.print_stats()
        print(s.getvalue())

        return response

Through this process we were able to eliminate a decent amount of computation that was either doing too much work or had remained as legacy code that didn't need to run anymore. This process got us to around 95% of our p99 target of less than 1.5 seconds but there was still something missing that wasn't obvious in the cProfile output.

Sentry Performance

I have long been a fan of Sentry for error monitoring in various systems I've worked on and when they released an APM tool a few months ago I jumped at the opportunity to try it out. Reasonably quick to set up and released, Sentry Performance samples your incoming requests or whatever you want to call a transaction in your system and provides valuable metrics around them such as p95/99, APDEX scores and the number of users impacted.

This decision was a game changer for us as it helped us to find the subset of requests in production that was causing the slowness in our system. Below is an example of what we were seeing for some payments, the large bar in the middle was a very slow database query that was doing some joins on some huge tables that wasn't actually needed.

This visual representation of the bottlenecks made it incredibly easy to find the issues in our code and figure out whether there was a better way to achieve the same result or even question why certain things needed to happen.

Results

While we still have much to do we have now hit our SLOs of a p99 < 1.5 seconds and a p95 < 800ms as shown below. By tackling this problem systematically by measuring the initial baseline and impact and the profiling and monitoring our changes we have been able to deliver our customers a much smoother payments experience and an overall better customer experience.

If you have enjoyed this blog and think you'd like to help solve similar problems in engineering or payments, DiviPay is hiring!

Load testing at a glance — An introduction to load testing concepts and tools
Getting started with Python testing — A beginner-friendly guide to writing tests in Python
Python packages that you may not have heard of — Lesser-known Python packages worth checking out
Chasing Python's Recursion Limit — Exploring the limits of recursion in CPython

Competitive Programming Tips

2019-06-06T00:00:00Z

These are some rough notes from a session I presented to a number of high school students preparing to compete in a programming competition. The notes are heavily centered around the use of Python 3 however the concepts can be adapted to most languages. If you notice any errors please feel free to let me know in a Github issue or submit a fix here.

Strategy

Don't think of the syntax/code straight way, take time to understand the problem in depth and try to explain your solutions before you start programming.
Depending on how confident each team member is, it isn't a bad strategy to divide and conquer, either one person tackling the hard problems first with the other team members getting through the easier ones or vice versa.
Be critical of time and be wary of when you can't solve a problem. The faster you accept that you can't find a solution, the more time you will have to work on other problems.

Finding Attributes and Methods

Besides using your favourite search engine and searching the python documentation, a quick way to find out what attributes and methods are available to a module or object is to use the dir function. For example, we might want to find all the things we can do to a list:

my_list = [1, 2, 3]
dir(my_list)

# ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__',
# '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__',
# '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__',
# '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index',
# 'insert', 'pop', 'remove', 'reverse', 'sort']

Notice at the end of the output there are a number of common list methods (append, clear, copy etc.) We can also try to find out what each of these do by checking if they have a docstring attached (a common way to generate documentation in python):

print(my_list.append.__doc__)
# L.append(object) -> None -- append object to end

The docstrings tend to be rather concise so if you need a better explanation of a particular object/method then the python documentation is the place to go.

Lambda Functions

Lambda functions are a quick way in Python to write simple functions. They are not named unless you assign them to a variable. For example:

square = lambda x: x * x
print(square(2))  # 4

this is the same as:

def square(x):
    return x * x

print(square(2))  # 4

Using lambda functions may make it faster to implement certain solutions as you will see further in this document.

Common Sub-problems

When you are programming competitively you will come across a number of sub-problems that can be very quickly solved so you can focus on the main problem. Theses are some of those sub-problems.

Filtering collections

You might have a collection/array/list of items that you need to filter so you can continue your program. There are a few ways to do this in Python:

my_list = [1, 2, 3, 4, 5]
new_list = []

for element in my_list:
    if your_condition(element):   # This line can be substituted for whatever condition you're filtering
        new_list.append(element)

or even shorter:

my_list = [1, 2, 3, 4, 5]
new_list = filter(your_condition, my_list)  # This returns a generator so you may need to cast it to a list: list(filter(your_condition, my_list))

Altering collections

Often you will need to apply some effect to a whole collection, for example you may need to square every number in a sequence. This can be quickly implemented by following this pattern:

my_list = [1, 2, 3, 4, 5]
new_list = map(lambda x: x * x, my_list)  # [1, 4, 9, 16, 25]

Combining two collections

Sometimes you will need to join two different collections as pairs, triplets etc. This can be down quickly using the zip function:

letters = ['a', 'b', 'c']
numbers = [1, 2, 3]

print(list(zip(letters, numbers)))  # [('a', 1), ('b', 2), ('c', 3)]

Quickly finding the total of a collection of numbers

my_list = [1, 2, 3, 4, 5]
print(sum(my_list))  # 15

Quickly finding the maximum of a collection of numbers

my_list = [1, 2, 3, 4, 5]
print(max(my_list))  # 5

Quickly finding the minimum of a collection of numbers

my_list = [1, 2, 3, 4, 5]
print(min(my_list))  # 1

Maintaining a unique set

You might need to make sure that you have a collection of only unique items, this can be done using a set:

s = set()  # {}
s.add(1)   # {1}
s.add(2)   # {1, 2}
s.add(2)   # {1, 2}

Counters

Sometimes you need to be able to count things really quickly and don't have the time to use a dictionary with a bunch of if statements. We can turn:

my_str = "the quick brown fox jumped over the lazy dog"
counter = {}

for element in my_str.split(' '):
    if element in counter:
        counter[element] += 1
    else:
        counter[element] = 1

# counter = {'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumped': 1, 'over': 1, 'lazy': 1, 'dog': 1}

into this:

from collections import Counter

my_str = "the quick brown fox jumped over the lazy dog"
counter = Counter(my_str.split(' '))

# counter = Counter({'the': 2, 'quick': 1, 'brown': 1, 'fox': 1, 'jumped': 1, 'over': 1, 'lazy': 1, 'dog': 1})

Reading files

Some competitions require participants to read from a file to gather test cases, you can do this quickly in Python 3 like so:

with open('your_file.txt', 'r') as f:
    lines = f.readlines()

do_whatever_you_want(lines)  # The 'lines' variable is still in scope when we leave the context manager/outdent

Writing to files

Similarly, you may be required to write files:

content = ['my\n', 'output\n']  # '\n' is important as it is the newline character
with open('your_output.txt', 'w') as f:   # 'w' mode will overwrite whatever is in the file, 'a' will append an existing file
        for line in content:
                f.write(line)

More Information

I hope this document has helped show some quick ways you can solve common problems in competitive programming, Trey Hunner has a brilliant article covering some more core python builtin functions that you may find useful..

Like any skill, programming is improved through practice, there are various sites out there to hone your skills but I personally prefer using https://www.hackerrank.com/ for preparing for competitions/interviews.

Chasing Python's Recursion Limit — Exploring the limits of recursion in CPython
The Sydney Train Carriage Problem — A mathematical puzzle involving train carriage numbers
Getting started with Python testing — A beginner-friendly guide to writing tests in Python
Python packages that you may not have heard of — Lesser-known Python packages worth checking out

Load testing at a glance

2019-05-25T00:00:00Z

For awhile now I've been wanting to talk about load testing and why we do it. Recently I was approached with a problem where some users were experiencing a very slow experience and I needed to rectify the issue. The way I went about solving this was to check what kind of response times I could expect for the particular path and actions those users were taking. By investigating the problem in this manner I was able to pinpoint the particular part of our system that was behaving slowing and begin to improve that.

It is problems like these where we can use load testing techniques to diagnose issues and put plans in place to fix them.

What is load testing?

The term "load testing" can be used in a few different ways. The first example is using it to describe a general class of testing methods that seek to uncover how a particular system behaves and reacts to changes in usage or traffic volume. The second example of the term refers to a specific type of test, one in which we target a system with normal amounts of traffic to see if it can handle the expected load.

When we are performing a load test we are simulating traffic to a system using some sort of tool and then monitoring our key metrics to decide whether or not the system maintains an acceptable state or the metrics become unacceptable. For example, if we have grocery list app, we might decide that if the response time to add an item to the list exceeds 200 milliseconds it produces an unacceptable experience for the user. In this example my key metric is the response time for adding an item and my threshold of acceptability is 200 milliseconds. It is important when designing and executing a load test that you take time to understand what your key metrics are and what you would deem unacceptable values for those metrics.

Stress Testing

A stress test is a type of load test that seeks to find the point where a system starts to degrade. This might be the volume of traffic at which a system starts to gracefully degrade features or it could be the point at which a system becomes completely unresponsive. A way in which we can execute stress tests is to continue to increase the volume of simulated traffic to the system until the key metrics you are monitoring become unacceptable. Note that all systems will probably have a point where they will degrade or fail, this is important to realize because in our grocery list example, it might be ok for the app to fail if we have 10 thousand users because we don't expect to have that many concurrent users. When designing a stress test you may want to have multiple points of acceptability, in our grocery list example this may look like:

100 users: Adding items must happen in under 200ms
1000 users: Adding items can happen in under 500ms
5000 users: Adding items can happen in under a second
10000 users: Adding items can take any amount of time

Of course we want our system to perform well but we should be looking at reasonable amounts of load, in this case we might be expecting maybe a couple hundred users. If you focus on the amounts of traffic that you aren't expecting any time soon you will find yourself over-optimizing.

Soak Testing

Soak testing is another form of load testing where we are not increasing the volume of traffic over the duration of the test but rather maintaining a relatively high load for a longer duration. In our grocery list example we might be expecting 100 users a day but we want to make sure that if it gains popularity we can support up to 500 users a day. To handle this we could run a soak test whereby we simulate 500 users for multiple days to see how our system behaves. This form of testing can help find bottlenecks in your system that require more attention for your next stage of growth, perhaps your simple queues can't handle that amount of load but can handle your current user base fine, these are the types of things you want to know ahead of time so you can fix them before your user base grows.

Spike Testing

Some types of applications have very spiky traffic patterns and others don't. You might find that your application has 100 users using it at all times but every Friday at 5pm traffic skyrockets to 1000 users. Once you know this you could account for it by using bigger servers or more powerful infrastructure but this also comes at a cost, wouldn't it be great if we could have some lower amount of capacity most of the week and then still handle the load volumes on Friday evening? This scenario is what spike testing tries to simulate, you run a load test with some baseline amount of traffic and then increase the traffic exponentially to see how your system behaves against your key metrics. Spike testing isn't just useful for reoccurring spiky traffic but also for preparing for unexpected events. For example, we might find that our grocery list app has baseline traffic of 100 users all day everyday but then we get featured in Groceries Weekly, the premier grocery publication, and our traffic soars in a couple hours. Depending on your application it might be important to make sure you can handle this rapid increase in traffic, imagine trying to get your startup off the ground and having everyone who read your first PR article see a 404 page.

Why do we load test?

We've covered some basic types of load tests, what key metrics and acceptable values are, but why would we want to load test in the first place. To summarize the examples above, we may want to know how our systems will behave under unforeseen amounts of load or rapid changes in traffic, we also might want to see whether or not we are over-provisioning our infrastructure based on the regular usage of a system or if our infrastructure can respond to changes in load. All of these questions don't need to be on the short term either, we might want to perform a load test now to see if we can handle the number of users we predict in 6 months time to make sure we are ready for that slow but steady increase in traffic or usage. Some common times to watch out for when load testing are any seasonal or reoccurring changes of load in your applications domain (Valentines day for flowers shops for example), any upcoming press releases or other traffic generating events, new feature releases to make sure the new feature can handle the baseline traffic, and any increase in complaints of poor system performance from your users.

How can I load test?

There are various tools that you can use to load test systems that range from paid services to free open source tools. Disclaimer: I haven't used any of the paid services and won't recommend any here but you can find them with a quick web search. On the free tools I have used Vegeta, Locust, Bees With Machine Guns, and Apache JMeter. Each of these tools have their own pros and cons and it is important to find the one that will work best for your use-case. The other option if you are working in a specialized environment or doing some particularly tedious user-flows is to write your own tool however this can be an expensive task if you're not going to be load testing often.

When performing a load test it is very important to do two things:

Do not run the test off your laptop: This is important because your are unlikely to be able to simulate enough load from one consumer machine, at the very least spin up a beefier machine on a cloud provider and use that or multiple machines to run the tests.
Do simulate common traffic patterns: Many load testing tutorials show you that you should test some particular path of a user. They login, then they hit the dashboard page, then they fill out a form but this isn't how your users actually behave. You need to look at the patterns your users have and design your test appropriately to make sure you are maximizing it's usefulness. If 70% of your traffic is logging in and hitting the dashboard and 30% is creating new accounts, your load test should be simulating that.

What metrics do I care about?

When we are performing a load test it is important to keep track of your systems metrics. A lot of the time the common metric you will be monitoring is the response time to the user since this is usually tied directly to their experience of your system, if it isn't monitor what is. When you are evaluating your key metrics there are a variety of statistics to keep an eye on. The first is the minimum and maximum, in the case of response time this can give you a good idea of the spread of different experiences users will see however these aren't the best metrics to monitor because you don't know how many users are seeing each extreme. A much better value to evaluate how your system is performing is your 95th and higher percentiles (P95 = 95th percentile). For example, the 95th percentile tells you that 95% of values are below what it reports. This is much better for understanding what the vast majority of your users are experiencing. The additional upside of monitoring these high percentiles is that they tend to remove outliers from your data. For example we might see some super slow requests due to network latency. The higher the percentile you can achieve acceptable values for the better.

When monitoring your metrics you will probably want to graph them over the duration of your test. This will help you understand how rapidly they change based on the amount of traffic you generate. For example if you double the traffic and you see your P95 triple, you know there is something in your system that is not scaling linearly.

When you are gathering metrics you also want to pick the specific granularity you require, if you are only looking at aggregates across your system it will make it harder to action your findings. I have found that, in the case of web services, monitoring each URLs performance tends to show bottlenecks quickly.

How can I action these metrics?

Hopefully, based off the metrics you gather you are able to tell what your slowest operations are and decide to focus performance efforts there. When picking what to improve you will usually want to pick whatever metric has the biggest delta with your acceptability threshold for that particular metric as they have the most to gain and often is the easier metric to make gains on. Once you investigate the area of the system that is performing unacceptably, you should note any expensive operations there including network calls, database operations and known slow functions. If you have tracing available you might be able to see what operations are using up the majority of time and focus efforts on optimizing those areas.

Going further

If you've made it this far and you'd like to learn more I recommend trying to load test a system you have available, you might even find some scalability or performance issues you didn't know you had. This talk by Rob Harrop is great at explaining what load tests are for and if you have a Pluralsight subscription Mick Badran has a good course on load testing using Azure DevOps.

If you have any questions please don't hesitate to contact me or reply in the comments.

Racing against time at DiviPay — How we improved payment approval performance at DiviPay
Getting started with Python testing — A beginner-friendly guide to writing tests in Python

Python packages that you may not have heard of

2019-02-02T00:00:00Z

The python package ecosystem is a vast forest with many great packages available. If you have been using python professionally you are more than likely to have stumbled across some of the more popular packages such as requests, django and flask. In this post however, I seek to bring your attention to a few packages that I have been using over the last 6 months or so and have found to be quite useful.

Glom

Glom is a package that was first released in April 2018, and it has since become my go to solution for accessing nested data structures in python. It has a very simple interface that allows you to declare what you are looking for in some structure and define a few other options around that. For example, if we have the following dictionary:

my_dict = {
    'foo': {
        'foofoo': 1,
        'foobar': 2
    },
    'bar': {
        'barfoo': 3,
        'barbar': 4
    }
}

and we wanted to find the value for foobar, we could simply write glom(my_dict, 'foo.foobar'). Furthermore, if we wanted to define a default for a KeyError we could pass a keyword argument like glom(my_dict, 'foo.foobar', default=None) which would then return None if there was nothing at the foo.foobar path.

Now glom's not just a data access package, but also a data formatting package. For example, given a dictionary of animals we might want to get a list of all the different colors that are in our data. So if we had this dictionary:

my_dict = {
    'animals': [
        {
            'name': 'dog',
            'num_legs': 4,
            'color': 'brown'
        },
        {
            'name': 'cat',
            'num_legs': 4,
            'color': 'orange'
        },
        {
            'name': 'frog',
            'num_legs': 4,
            'color': 'green'
        }
    ]
}

we could get our list of colors with the following call to glom glom(my_dict, ('animals', ['color'])) which would result in ['brown', 'orange', 'green'].

Crayons

Crayons is another package from serial-creator Kenneth Reitz. This package excels at making it easy to color your text in command line applications. I know I've certainly found it useful when writing reporting tools or for coloring prompts for dangerous actions. The API itself is well designed, you pass a string to a function named after the color you want the text to be and then it will return the decorated string for you so that when you print it to the console it's already colored for you. For example:

import crayons

red = crayons.red("red")
print(f"The end of this string is {red}.")

which prints

The end of this string is red.

Another useful feature that crayons provides is that of turning off the colors when you want to via calling crayons.disable().

Hypothesis

Hypothesis is a testing framework that will make you think differently about testing your python code. Rather than writing standard unit tests where you come up with the input for some function or generate random fake input each time, Hypothesis actually inspects your code and figures out what your edge cases are and tries to test against them, furthermore Hypothesis attempts to report back to you the minimal test case that will break your code so you can find and fix bugs faster than ever. One example of Hypothesis power is as follows, say we are trying to write a function to square a number. Our first attempt might be to produce the square via addition:

def square(x):
    total = 0
    for _ in range(x):
        total += x

    return total

and our normal, naive, test case might look like:

def test_square():
    assert square(4) == 16

Notice that this is a bad test case because we forgot to check edge cases. Well, if we rewrite the test case using Hypothesis, we can find those edge cases. Here's how we would write it:

from hypothesis import given
from hypothesis.strategies import integers

@given(x=integers(max_value=100))
def test_square(x):
    assert square(x) == x * x

In this test case, we are asking Hypothesis to generate us test cases where we are testing an integer up to 100 and checking that it is squared correctly. Upon running this, Hypothesis alerts us:

Falsifying example: test_square(x=-1)

indicating that when x = -1 we fail our test. We can then fix our function and noticed all tests now pass:

def square(x):
    return x * x

Marshmallow

Marshmallow is a serialization library that I have become fond of due to it's similarity to Django REST Framework Serializers. It has a declarative style that allows you to very quickly describe the data you are trying to serialize to and from native objects to standard formats like JSON. Furthermore, Marshmallow allows us to validate our data on load via specifying keyword arguments to the different field types on our schema:

from marshmallow import fields, Schema

class AnimalSchema(Schema):
    name = fields.String(required=True)
    num_legs = fields.Integer(required=True)
    color = fields.String(required=True)

Here we have a schema that represents some animal, similar to our glom example. Now we could try load some JSON into this:

dog = "{\"name\": \"dog\", \"num_legs\": 4}"
animal = AnimalSchema().loads(dog)

However, Marshmallow realises that this data isn't the correct format since it is missing the color field and therefore raises a ValidationError alerting us to this fact. Note that if we excluded the required=True argument from the color field this would not happen. If we fix the JSON, our call will return to us a native python dictionary ready to be used elsewhere:

dog = "{\"name\": \"dog\", \"num_legs\": 4, \"color\": \"brown\"}"
animal = AnimalSchema().loads(dog)

print(animal)  # {'num_legs': 4, 'name': 'dog', 'color': 'brown'}

Pystache

Pystache is one of those libraries that is just so simple and gives back so much value so quickly. I first used Pystache when I was writing an email template for an application. It allowed me to worry more about the presentation of the template than getting the data into it due to it's extremely well designed API. Simply put, Pystache is a templating library that allows you to embed variables into large and often complex strings. For example, if we wanted to populate a html template with a number of variables we could do it like so:

import pystache

template = """
<h1>{{title}}</h1>
<h2>{{subtitle}}</h2>

<p>
{{body}}
</p>
"""

context = dict(
    title="My Post",
    subtitle="An example of Pystache",
    body="Pystache is awesome"
)

print(pystache.render(
    template,
    context
))

and the result would be:

<h1>My Post</h1>
<h2>An example of Pystache</h2>

<p>
Pystache is awesome
</p>

Getting started with Python testing — A beginner-friendly guide to writing tests in Python
Describing Descriptors — A talk on Python descriptors and the descriptor protocol
Describing Descriptors - A follow up — A deeper dive into Python descriptors with additional examples
Racing against time at DiviPay — How we improved payment approval performance at DiviPay

Describing Descriptors - A follow up

2018-09-03T00:00:00Z

Recently I gave a talk at PyConAU 2018 entitled "Describing Descriptors". In this talk I sought to provide the audience an introduction to Pythons descriptor features and some basic use cases they might have for them. Following the talk there were some questions that dug into areas that I had not experimented with in regards to descriptors. This post seeks to answer those questions and any other interesting questions I find while investigating.

How do you delete the actual descriptor instance from a class?

So if we take a very simple descriptor, Desc, which prints when it is accessed:

from weakref import WeakKeyDictionary

class Desc:
    def __init__(self):
        self.data = WeakKeyDictionary()

    def __get__(self, instance, owner):
        print('Called: __get__')
        if instance is None:
            return self
        return self.data[instance]

    def __set__(self, instance, value):
        print('Called: __set__')
        self.data[instance] = value

and use it on a class, note that we are declaring it as a class attribute:

class Person:
    name = Desc()

    def __init__(self, name):
        self.name = name

We can then create instances of this class

>>> p1 = Person('matt')
Called: __set__

noting that Desc.__set__ was called. We can see that this descriptor exists on the Person class itself:

>>> 'name' in dir(Person)
True

and then delete it from the class directly:

>>> del Person.name

Finally, when we create a new instance, we can see that Desc.__set__ was no longer called and the descriptor doesn't exist on the class anymore:

>>> p2 = Person('sam')
>>> 'name' in dir(Person)
False

Do note however, that our p2.name was still set, however it is now just a regular instance attribute.

Can you chain descriptors?

When clarifying this question, the question was better phrased as "Can we combine descriptors for validations?". For this we will be reimplementing the NonNegativeInteger descriptor from the original talk however this time we want to implement the integer validation and the non-negative validation as separate descriptors for whatever reason. Note that this is a very simple usecase and probably doesn't require descriptors but I'm using it to demonstrate the concept. We are going to implement a _validate hook that any subclass can implement to allow us to add extra validations to the descriptor, the reason we do this is because the superclass will call self.data[instance] = value before the end of the method and so we would a dirty dictionary if we just called super().__set__ from the subclass.

from weakref import WeakKeyDictionary

class IntDesc:
    def __init__(self):
        self.data = WeakKeyDictionary()

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data[instance]

    def __set__(self, instance, value):
        if not isinstance(value, int):
            raise TypeError('Value is not an int')

        if hasattr(self, '_validate'):
            value = self._validate(instance, value)

        self.data[instance] = value


class NonNegativeInteger(IntDesc):
    def _validate(self, instance, value):
        if value < 0:
            raise ValueError('Value must be non-negative')

        return value


class Person:
    age = NonNegativeInteger()

    def __init__(self, age):
        self.age = age

As you can see, our IntDesc checks whether or not the value being set is an integer and before it sets the value it calls any extra validations implemented in _validate. We can then add those validations in NonNegativeInteger._validate. This isn't a particularly pretty example, please let me know if you have a nicer way of explaining descriptor subclassing. The primary issue with this approach is that we needed to add the _validate hook where our ideal solution would use a call to super() however this doesn't work with features that need to go after the initial call rather than before.

Is there a performance penalty for using descriptors?

I was unsure as to whether there was a performance penalty for using descriptors and so I've tested it. The feature I wanted to test performance on was a simple setting of an attribute. For this test we are going to try 5 different approaches:

Using a descriptor that stores the value on the instance.
Using a descriptor that stores the value in a WeakKeyDictionary.
Storing the value on the instance as normal.
Storing the value using @property.
Storing the value using __setattr__.

The 5 implementations can be found on Github

My method was to time the implementations while they set a value 1 million times. I repeated this 100 times and averaged the results to try smooth out any outliers. The results are as follows:

Descriptor storing the value on the instance: 0.144 seconds
Descriptor storing the value in a WeakKeyDictionary: 0.142 seconds
Storing the value on the instance as normal: 0.139 seconds
Storing the value using @property: 0.382 seconds
Storing the value using __setattr__: 0.678 seconds

As you can see, both implementations of descriptors were only slightly slower than storing the value directly on the instance whereas @property and __setattr__ were significantly slower on average. All of these benchmarks were ran on my 2017 Macbook Pro.

Can you have inter-attribute validation?

You sure can, building on some points from the talk, please don't add too much magic to your code as you will confuse beginners and keeping the code simple will lead to greater maintainability. Anyway, we can make two attributes rely on one another by accessing them through the instance argument:

from datetime import datetime, timedelta
from weakref import WeakKeyDictionary

class AgeDesc:
    def __init__(self):
        self.data = WeakKeyDictionary()

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data[instance]

    def __set__(self, instance, value):
        if not isinstance(value, int):
            raise TypeError('Age must be an integer')

        if value < 0:
            raise ValueError('Age must be a positive integer')

        self.data[instance] = value


class DobDesc:
    def __init__(self):
        self.data = WeakKeyDictionary()

    def __get__(self, instance, owner):
        if instance is None:
            return self
        return self.data[instance]

    def __set__(self, instance, value):
        # Check that this date of birth is valid for the instances age
        if not isinstance(value, datetime):
            raise TypeError('DoB must be a datetime')

        if (datetime.now() - value).days - (365 * instance.age) > 365:
            raise ValueError(f"If the age is {instance.age}, the date of birth cannot be {value}")

        self.data[instance] = value


class Person:
    age = AgeDesc()
    date_of_birth = DobDesc()

Here you can see that we are creating two data descriptors, one for the age and one for the date of birth. The AgeDesc simply checks that the age is a number and is positive however the DobDesc initial checks the value is a datetime but also verifies that it is a possible date by comparing it to the age attribute on the Person in its __set__ method. If it isn't a valid date then it raises an error. For example:

p = Person()
p.age = 10
p.date_of_birth = datetime.now() - timedelta(days=365*10) # Sets the DoB fine
p.date_of_birth = datetime.now() - timedelta(days=365*20) # Raises a ValueError

How do descriptors work with slots?

__slots__ are an interesting feature of Python and allow the programmer to assign fixed memory for their instance attributes rather than having their class use __dict__. For example, we can create a Person class with a name attribute like so:

class Person:
    __slots__ = ('name', )


p = Person()
p.name = 'matt'

In CPython, slots are implemented as descriptors. I have tried to assign my own descriptor to a slot by declaring a descriptor as usual which doesn't work and I also tried to subclass the slots type to create my own descriptor but that didn't work either. This stack overflow answer suggests that we can add extra validations to members of __slots__ by hiding renamed variables on the class however this adds more descriptors on to the class/instance so you now have 2 attributes for your initial 1.

Describing Descriptors — The original talk on Python descriptors and how they work
Chasing Python's Recursion Limit — Exploring the limits of recursion in CPython
Getting started with Python testing — A beginner-friendly guide to writing tests in Python
Python packages that you may not have heard of — Lesser-known Python packages worth checking out

Describing Descriptors

2018-08-25T00:00:00Z

Slides available here

Oftentimes beginner programmers go through traditional features when learning a language. For Python beginners this might involve variables, control structures like if-statements, while and for loops, dictionaries, and finally classes. However, if we read the Python documentation we find that another feature that can be used in Python is that of the descriptor protocol. Descriptors allow the programmer to override the storing and retrieving of different class instance variables such that special behaviours can be followed. For example, we might want some variable to follow some special validation. We could do this using __setattr__ on the containing class but perhaps we want to reuse the validation in another class, or we want other validations for other variables and we don't want __setattr__ to become a huge if/elif/else block. In this talk, I will walk attendees through what a descriptor is, what use cases they can use them in, how to implement a descriptor, and common descriptors in the Python ecosystem that users may or may not have identified as descriptors (often just referred to as magic).

Describing Descriptors - A follow up — A deeper dive into Python descriptors with additional examples
Chasing Python's Recursion Limit — Exploring the limits of recursion in CPython
Getting started with Python testing — A beginner-friendly guide to writing tests in Python
Python packages that you may not have heard of — Lesser-known Python packages worth checking out

Getting started with Python testing

2018-04-22T00:00:00Z

"People also underestimate the time they spend debugging. They underestimate how much time they can spend chasing a long bug. With testing, I know straight away when I added a bug. That lets me fix the bug immediately, before it can crawl off and hide." -- Martin Fowler

Testing your code is a common practice that I feel requires more attention. During my degree there was a period where unit tests were emphasized as an appropriate thing to do, and other times where they were completely ignored. Similarly, during my high school years I competed in a number of programming competitions, often over a number of weeks where the solutions I was producing was being marked against some test but I was only manually testing my code before having it marked. While testing is brought up in high school curriculums as well as university, I believe it is generally only discussed as an idea and not actually implemented in class. The aim of this post is to give you a general introduction to how we might be able to test a function in Python 3.

Our Plan

There is a process used in the software industry called Test-Driven Development or TDD for short. While it's not always followed, it does provide a nice way of developing software small and large. Simply stated, TDD requires the developer to write their tests first, and then write the actual product code. Today, I will demonstrate this practice in this blog post.

The Problem Space

I have decided to pick a non-trivial problem for this demonstration, writing a function that returns the nth Fibonacci Number. While this function is not particularly difficult to implement, I have chosen it so that people of all experience levels can follow along.

The Fibonacci sequence begins as follows:

1, 1, 2, 3, 5, 8, 13, 21, ...

where we start with the numbers 1, 1 and then get the next number in the sequence by adding the two previous numbers together. For example, the next number in our example after 21 would be 13 + 21 = 34.

Initial Requirements

When we approach a problem that requires tests, we need to come up with some requirements that we want to test are satisfied. Looking at the fibonacci sequence we have a few requirements, namely:

When we pass the function some number n we expect it to return the number that is in position n in the sequence, with the zeroth number being 1.
Our function should raise a ValueError if we pass in a number less than 0 because it makes no sense to have anything before the zeroth position in the sequence.

Our First Test

I use pytest when writing tests as I find it produces nice output and together with a number of plugins can lead to a very enjoyable testing experience. Using pip you can install pytest from the command line:

pip install pytest

Testing in Python doesn't strictly require pytest but I find it makes testing a lot nicer.

Now opening a new file called test_fibonacci.py we can begin writing our tests. I like to start by importing TestCase from the unittest module included in the Python standard library:

from unittest import TestCase

This class will allow us to write subclasses to test different parts of our code, in this tutorial we will only write one. You can also write individual tests as functions as shown on the pytest homepage however by writing your tests as classes you can group your tests in a more pleasing fashion. To begin with, lets create a FibonacciTests class:

from unittest import TestCase

class FibonacciTests(TestCase):
    pass

This snippet has created a new class that gives us all the utilities available to TestCase which we will cover as we go. Now to write some tests, unittest requires that we write our method names begining with test. The first test we will write is to check our first condition, that the fibonacci function will return the correct fibonacci number at the position. Our first test looks something like this:

from unittest import TestCase

class FibonacciTests(TestCase):
    def test_returns_correct_fibonacci_number(self):
        correct_sequence = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
        for index in range(len(correct_sequence)):
            response = fibonacci(index)
            assert response == correct_sequence[index]

This test, test_returns_correct_fibonacci_number follows a common testing pattern:

Setup your initial expected data
Run the code that you are testing
Check that the tested code did the right thing (in this case, returned the correct fibonacci number)

Disclaimer, the fibonacci sequence is an infinite sequence and so we can't test all values but we can be reasonably sure of the correctness. The general advice is to try to test any edge cases, for us that's positions less than 0, and also test the general case as best you can. While we could have tested hundreds or thousands more numbers, tests are often part of a test suite which might contain hundreds or thousands of tests which each require some time to run. This is a choice that the developer and potentially a QA team need to make as to how quickly you want tests to run versus how sure you need to be about the correctness of the code. Your tests may not ever be complete, even if you have full code coverage, and ultimately it is about reducing the risk that a user is going to come across a bug, that is we want our test suites to find bugs before we release any software to our actual users.

Back to the test and following the advice from above, we choose to test 12 different inputs to the function fibonacci (which we are yet to write). For each on of the positions we pass it to the fibonacci function and then test that the response from that call matches the expected result from our correct_sequence list using the assert keyword followed by the condition we want to check. In this case, we are asserting that the response from our fibonacci function matches the corresponding condition in our expected sequence.

Our Second Test

The second requirement we had for our fibonacci function is that it raises a ValueError if we pass it any position less than 0. Again, we can't possibly test all numbers less than 0 but it is probably fine for us to just test -1 in this case since we have already tested 0 in test_returns_correct_fibonacci_number. Our code now looks like this:

from unittest import TestCase
import pytest

class FibonacciTests(TestCase):
    def test_returns_correct_fibonacci_number(self):
        correct_sequence = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
        for index in range(len(correct_sequence)):
            response = fibonacci(index)
            assert response == correct_sequence[index]

    def test_raise_value_error_on_negative_input(self):
        with pytest.raises(ValueError):
            fibonacci(-1)

To test that fibonacci raises a ValueError on -1 we can use the pytest.raises context manager as shown above. pytest.raises takes our exception type and then passes the test if the code in the scope of the with block raises that type of exception.

Running Our Tests

Now that we have our initial tests, it's time to run them. To do this, simply run:

pytest

in your command line. By running this, we have stumbled across our first bug:

test_fibonacci.py FF                                                                                                                                                                                 [100%]

================================================================================================= FAILURES =================================================================================================
_________________________________________________________________________ FibonacciTests.test_raise_value_error_on_negative_input __________________________________________________________________________

self = <test_fibonacci.FibonacciTests testMethod=test_raise_value_error_on_negative_input>

    def test_raise_value_error_on_negative_input(self):
        with pytest.raises(ValueError):
>           fibonacci(-1)
E           NameError: name 'fibonacci' is not defined

test_fibonacci.py:15: NameError
___________________________________________________________________________ FibonacciTests.test_returns_correct_fibonacci_number ___________________________________________________________________________

self = <test_fibonacci.FibonacciTests testMethod=test_returns_correct_fibonacci_number>

    def test_returns_correct_fibonacci_number(self):
        correct_sequence = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
        for index in range(len(correct_sequence)):
>           response = fibonacci(index)
E           NameError: name 'fibonacci' is not defined

test_fibonacci.py:10: NameError
========================================================================================= 2 failed in 0.07 seconds =========================================================================================

Looking at the pytest output, we can see the obvious issue, we have no fibonacci function yet. Lets fix that in a file called main.py:

def fibonacci(position):
    if position == 1 or position == 2:
        return 1
    return fibonacci(position - 2) + fibonacci(position - 1)

Before we run our tests we will need to import this function at the top of our test_fibonacci.py file like so:

from main import fibonacci

Debugging `fibonacci`

Running our tests again, we get a RecursionError which means that our function called recusively too many times. Looking at our code, we notice we have accidentally forgotten to 0-index our positions. This can be fixed as follows:

def fibonacci(position):
    if position == 0 or position == 1:
        return 1
    return fibonacci(position - 2) + fibonacci(position - 1)

Now when we run pytest we notice that we have actually passed the first test but failed test_raise_value_error_on_negative_input:

test_fibonacci.py F.                                                                                                                                                                                 [100%]

================================================================================================= FAILURES =================================================================================================
_________________________________________________________________________ FibonacciTests.test_raise_value_error_on_negative_input __________________________________________________________________________

position = -1883

    def fibonacci(position):
>       if position == 0 or position == 1:
E       RecursionError: maximum recursion depth exceeded in comparison

main.py:10: RecursionError
==================================================================================== 1 failed, 1 passed in 1.14 seconds ====================================================================================

While we still have to fix our second test, congratulations on passing your first! It's that pesky RecursionError again, how could that have happened? Well, looking at the last line of the function return fibonacci(position - 2) + fibonacci(position - 1) if we pass -1 as the position it will keep hitting this line and never exit, so there's our issue. We can fix this by adding a check for negative numbers:

def fibonacci(position):
    if position <= 0:
        raise ValueError('position must be non-negative')
    if position == 0 or position == 1:
        return 1
    return fibonacci(position - 2) + fibonacci(position - 1)

We re-run pytest and oops, we broke our first test:

test_fibonacci.py .F                                                                                                                                                                                 [100%]

================================================================================================= FAILURES =================================================================================================
___________________________________________________________________________ FibonacciTests.test_returns_correct_fibonacci_number ___________________________________________________________________________

self = <test_fibonacci.FibonacciTests testMethod=test_returns_correct_fibonacci_number>

    def test_returns_correct_fibonacci_number(self):
        correct_sequence = [1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144]
        for index in range(len(correct_sequence)):
>           response = fibonacci(index)

test_fibonacci.py:10:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

position = 0

    def fibonacci(position):
        if position <= 0:
>           raise ValueError('position must be non-negative')
E           ValueError: position must be non-negative

main.py:17: ValueError
==================================================================================== 1 failed, 1 passed in 0.06 seconds ====================================================================================

What a sneaky mistake, we accidently raised the ValueError when position is 0 as well. A quick fix should get everything passing:

def fibonacci(position):
    if position < 0:
        raise ValueError('position must be non-negative')
    if position == 0 or position == 1:
        return 1
    return fibonacci(position - 2) + fibonacci(position - 1)

and TA-DA!!! We passed all our tests:

test_fibonacci.py ..                                                                                                                                                                                 [100%]

========================================================================================= 2 passed in 0.01 seconds =========================================================================================

Well Done!

Well that brings us to the end of testing our fibonacci function. I hope you have now covered:

The fundamentals of Test-Driven Development
How to use pytest to run simple tests
How to use testing as a means to find bugs and describe requirements

The next thing you would need to do if this was production code, is assume that you will get bug reports about this code, perhaps we missed a requirement. When a new issue comes in, you would add another test case, run the tests to make sure that it fails, and then make any changes to the function to make that test pass. Once you again have reason to believe the code is correct, you can go ahead and re-release it.

You can find all the code related to this post here on GitHub.

Python packages that you may not have heard of — Lesser-known Python packages worth checking out
Describing Descriptors — A talk on Python descriptors and the descriptor protocol
Chasing Python's Recursion Limit — Exploring the limits of recursion in CPython
Racing against time at DiviPay — How we improved payment approval performance at DiviPay

Explained Simply - P vs NP

2018-04-02T00:00:00Z

Recently I was asked to explain a complex topic simply. I had the opportunity to choose any topic of my liking and so I chose the P vs NP problem.

What is it?

The P vs NP problem is one of the seven Millennium Problems put forward by the Clay Mathematics Institute, which will award $1 million to the person or group that solves the problem.

The problem statement is simply put, does P == NP or does P != NP? To understand this question we must first understand the terms involved?

P? NP?

In computational complexity theory computer scientists and mathematicians try to classify certain problems into classes of similar problems. This is done by proving that certain problems are in fact similar. For example, if I needed to sort fruit into different classes, I might put a red apple with a green apple because they both had the same shape, however I might not include bananas as they are radically different.

P

So the first class of computational problems we are going to talk about is the class P. P (standing for polynomial) represents a class of problems that can be solved in some reasonable amount of time. For a problem to belong in this class P, the time it takes to solve the problem must not radically change between different variations of the problem. For example, if I asked you to sort 5 cards, you might do it in 5 seconds and if I asked you to sort 10 cards, you might do it in 10 seconds. Since this time is predictable, this problem is said to be of type P.

NP

The second class of problems we need to know about is NP, which stands for Nondeterministic Polynomial. NP represents the class of problems that we don't know we can produce solutions for in some predictable time. For example, we need to sort 5 cards and it takes 5 seconds however sorting 10 cards might take 37 seconds for some reason, and furthermore, 11 cards might take 2 hours. If this were the case for sorting cards, we would call in an NP type problem. Another important fact about NP typed problems is that we must be able to check that the solution is correct very quickly. For example, it might take you ages to give me the solution to the sorted cards, but once you do I can very easily check that they are actually sorted.

So what's with the whole P vs NP thing?

So remember the million dollars we want? Well to get it, we need to prove P == NP or P != NP. To prove the former, we would need to prove that all problems in NP, can actually be solved by some method that fits the description of P type problems (for everyones sanity, card sorting is actually in P). That is, all hard problems can be transformed into easy problems. To prove the latter, P != NP, we would need to prove that all NP problems are actually NP-Complete (simply put, these are definitely hard problems, perhaps I'll explain this in another blog post).

I hope that this short ramble has simplified the problem of P vs NP to something that is more digestible, however I recommend to the reader that wants to dive into this stuff more to checkout out Michael Sipser's Theory of Computation.

The Sydney Train Carriage Problem — A mathematical puzzle involving train carriage numbers
Competitive Programming Tips — Practical tips and tricks for competitive programming in Python

Chasing Pythons Recursion Limit

2017-08-29T00:00:00Z

I recently saw this post on Facebook, annotated with "Unless you use Python":

Credit Computer Science Memes for Travelling Salesman Teens

This led me to want to contradict the statement, although probably not recommended. So, lets consider the following python program:

import sys

def recurse(n):
    print(n)
    recurse(n + 1)

if __name__ == '__main__': recurse(1)

It is a simply recursive program which simply prints out the value of n. If you run it you will find that it will raise a RecursionError at some point, for me 997 was the last value printed. Now it is important to understand why this happens. Under the hood, the Python interpreter enforces a recursion depth that is set sanely so you don't cause the underlying C implementation to produce a stack overflow.

But sticking with my aim of disproving the original Facebook post, I set out to get around this limitation. What I came up with is this:

import sys

def recurse(n):
    print(n)
    sys.setrecursionlimit(sys.getrecursionlimit() + 1)
    if n == 2000:
        return
    recurse(n + 1)

if __name__ == '__main__': recurse(1)

Sure enough, running this increments the recursion depth naively making sure we never hit the limit. If we remove the if-statement we will achieve our goal of seemingly-infinite recursion. However, it's got a slight problem, if we do remove the safety of the if-statement and run the code, we will get a segmentation fault. For me, this occured at n = 36142 and looked like this:

.
.
.
36140
36141
36142
[1]    7655 segmentation fault  python3 main.py

Now 36142 seems to be pretty close to 2^15, perhaps there is some reason behind this? I attempted to look at the Python interpreter source code but to no avail, if anyone has an answer stemming from this I would very much appreciate you contacting me and explaining it. As for my own analysis to why this is the number we land at, I present the following.

First we need to change the code so it doesn't segfault any more. To do this, we limit the size of n to 36141. By profiling our code using python3 -m cProfile main.py we get the following output:

         144527 function calls (108397 primitive calls) in 0.410 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.410    0.410 main.py:1(<module>)
  36131/1    0.094    0.000    0.410    0.410 main.py:3(recurse)
        1    0.000    0.000    0.410    0.410 {built-in method builtins.exec}
    36131    0.294    0.000    0.294    0.000 {built-in method builtins.print}
    36131    0.009    0.000    0.009    0.000 {built-in method sys.getrecursionlimit}
    36131    0.013    0.000    0.013    0.000 {built-in method sys.setrecursionlimit}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Looking at the output we see that there are 144527 function calls, 36131 of them being calls to our recurse function, 36131 calls for each of the print, sys.getrecursionlimit and sys.setrecursionlimit functions. Now calling sys.getsizeof(recurse) lets us know that our recurse function takes up 136 bytes. Similarly, we find that print, sys.getrecursionlimit and sys.setrecursionlimit are all 72 bytes in size. This means that we are using at least ((36131 * 136) + (36131 * 72 * 3)) / 1024 / 1024 ~= 12mb during execution. Perhaps this means the limit of the Python call stack is somewhere around this number.

Final Thoughts

Upon exploring Stack Overflow (No pun intended) and various other sites, I have not been able to find specific reason why I can not increase the recursion limit, and the stack size, to accomodate the needs of infinite recursion.

Final ideas on achieving this would include having to make actual code changes to the Python interpreter and/or predicting the number of calls you will make within the recursive function so you may grow the stack accordingly. For example, recurse made 4 function calls in total and hence I imagined I could increase the recursion depth by 4 each time (this did not work surprisingly, leaving me with the same magic number, 36131, and a segfault.

Conclusively, I suspect it is not possible to have infinite recursion in Python either because our call stack will grow too fast or we simply allocate too much memory.

Describing Descriptors — A talk on Python descriptors and the descriptor protocol
Describing Descriptors - A follow up — A deeper dive into Python descriptors with additional examples
Getting started with Python testing — A beginner-friendly guide to writing tests in Python
Competitive Programming Tips — Practical tips and tricks for competitive programming in Python

On Finishing University

2017-07-09T00:00:00Z

As of last week, and pending results, I completed an exam signifying the completion of my bachelors degree in computer science at the UNSW. In this post I will endeavour to ramble, hopefully not too freely, on the changes I have noticed within myself over the past few years.

Now before I started my degree there was a period of time where I needed to make a decision about my future. What do I want to do in the future? What should I do in the future? Should I go to university? Is it worth it? Well, even now I haven't yet figured out my answers completely however I do now know that I wouldn't regret a single moment of the past 3 or so years. Initially my answer to the first questions, what I want or should want to do, was very vague and hovered about the fields of STEM (Science, Technology, Engineering, Mathematics) due to my background in FIRST robotics and the UNSW high school computer science course, HS1917. This vagueness led me to do a lot of research into the different areas I was interested in and without boring you with the details, I chose computer science. This conclusion was both preceded and followed by the same question, should I do computer science (CS) or software engineering (SENG)? These days I, and possibly many readers, note that the two subjects are nearly indistinguishable in a career context (although quite distinguishable in the academic context) however my peers, the media and the job market all continued to raise this question within my mind over the course of my studies. Although I could write a whole post on the differences between the two subjects (and perhaps I will someday), it is important for anyone considering the two to realise that in terms of job prospects, I don't believe it matters directly which of the two you have on paper but that you can demonstrate an understanding of both.

So was university worth it? Well besides from having a whole heap of debt I don't want to look at, I think it really was worth the time. My greatest credits to the university experience are not my times in the lecture theatres or the labs but rather the conversations over lunch or the challenges that students set me when I taught. These experiences are unlike any intellectual pursuit one would find elsewhere as there is such a density of knowledge in the same area, fuelling ideas and growth. This being said, I definitely don't want to discredit the times in class as I know wholeheartedly that if it weren't for being enrolled I would not have learnt all the things I have not because I'm lazy or unmotivated but because I would not have known about the things I did not know about and hence could not pursue their understanding.

During my degree, one constant that I found through out all my classes was distractions. Now I'm not talking about distractions in the sense of someone doing skateboard tricks out the window or daydreaming about beaches but distractions that were associated with the study that I was doing. These kinds of activities are difficult to class as distractions for myself since I would not be where I am today without them yet they probably took a toll on my grades. I'm not saying it is impossible to keep high grades during university but that it is healthy to start applying your new skills to the real world. In my case these distractions included working on numerous side projects and open source work as well as learning new technologies outside of the university prescribed skills. It is these "distractions" that led me to my first position as a software developer well before the completion of my degree, something that I am very grateful to have begun before leaving university.

So where to next? Well I'm not 100% sure of that answer yet, perhaps I will study some more in the form of a masters degree around pure computer science, cybersecurity, or business. As a life long learner I intend to continue growing my skillset both through personal interest and project requirements as well as pursue more content creation and education though talks and getting involved in the development community be through social networks or conferences and groups.

If there have been any questions I have raised within your own mind, please do not hesitate to contact me in the comments section or on Twitter.

Your Agents Need You to Be a Product Engineer — Why product thinking matters more than ever in the age of AI
Explained Simply - P vs NP — A simplified explanation of the P vs NP problem

The Sydney Train Carriage Problem

2017-02-02T00:00:00Z

Living in Sydney it is inevitable that I travel via the public transport system, after all the highways are so congested that there is no other option unless you're in a position to pay exorbitant housing prices, but that's another story. On the Sydney trains, each carriage has a unique 4 digit number, usually preceded by some letter, printed up on the interior wall. They look like this:

It has been a reoccurring game amongst my friends over the past few years that whenever we are on a train we need to try to make 10 using the carriage number. Furthermore, I believe this game is known amongst many Sydneysiders and hence I decided to write about it. The rules are simple though there are a few variations:

Using common mathematical operations make 10 using the individual digits from left to right.
Using common mathematical operations make 10 using the individual digits in any order you like.

For example, the carriage number might be 5432, ignoring the prepended letter. An appropriate solution might be 5 + 4 + 3–2 = 10.

The Hypothesis

My simple hypothesis, and many may concur, is that since I have never met a train carriage number where I haven't found a solution, I propose all carriage numbers can be arranged to equal 10.

The Method

Thinking about the problem (Are all train carriage numbers solvable?) I found that writing a proof would require some form of mathematics where I could essentially do algebra on operators rather than the numbers they control, being unsure about how to do this I turned to a programmatic approach. Please contact me if you know how to mathematical prove this.

I decided that since there were only 10000 numbers I needed to test, I would just brute force the combinations of operators and see how I went. My code is written in Python3 and is available on GitHub. Please note the code may or may not be polished when you get there.

I had a weak goal of also trying to get as many numbers covered with the minimal number of mathematical operators, so in running my experiments I added new operators after each previous experiment in the hopes of raising the total coverage of numbers. I refer to coverage as the percentage of numbers that had solutions equal to 10.

The Experiments

My first lot of experiments ran through the numbers testing equations of the form ((w O x) O y) O z = 10, where O is just a placeholder for random operators and w, x, y, and z are the digits of the carriage number.

Running this with the operator set {+, -, *, /} yielded a 42.09% coverage, a lot lower than I expected. This led me to think about what kinds of tricks my friends and I had deployed on particularly tricky numbers.

In the second test, I added the floor and ceil operators resulting in a hugely positive increase to 67.94%. My hypothesis was looking a tad more likely. What was I missing though?

Another operator that I had been holding off on adding due to the added complexity was the factorial operator. I only allowed factorial to be calculated on the single atomic digits, not the results of other operators. This led to a dramatic performance decrease, even with cached answers, though it raised the coverage to 87.31% which also raised my hopes of a correct hypothesis.

In another attempt to raise the coverage, I switched to variation 2 of the problem (allowing rearrangements of the number) which led to an increase to 97.18% coverage. This particular variation, which still included factorials on the atomic digits, was very very very slow. I knew that if I was to keep experimenting I would need to speed it up somehow. My first instinct was to run the experiment concurrently. Though this led to a small speedup, each number was taking a second or 2 which meant that the computation was still too slow and heavy for my Macbook Air to handle. I needed a boost, and I found that boost in an AWS EC2 c4.8xlarge.

Only ~3% left uncovered, precisely 282 numbers that I could not find a solution for. I thought it would be worth looking at the numbers that failed and where they fell in the range 0000 to 9999. My suspicion was that a large chunk of them would fall below 1000. Unfortunately, there was only 85 failures, which meant I still had 197 lurking within the remaining 9000 numbers.

Moving onto my next experiment my plan was to run through the full 10000 numbers but also keep a track of the failures so that I could rerun the tests on a much smaller range moving forward. Why didn't I think of this earlier?

So to begin with I ran the 2nd experiment again (basic ops and floor/ceil), but keeping track of which numbers failed this time around. Since I'm allowing for rearrangements now, we could reduce the set even more since different sequences of numbers are equivalent. I.e 0114 is the same as 4011, 4101, etc. This speeds up the total computation time as we know that if we can solve for 0114 we also solve the other permutations of those digits.

Adding in factorials led managed to reduce this set to the 282 numbers I was looking for. In my final experiment I added the exponentiation as well as allowing the absolute value operator.

The Final Result

After the experiment finished I was left with the following sequences:

These sequences allow for rearrangements, this is how I calculated them:

So, there we have it, 185 numbers cannot (as far as I have calculated) be solved to equate to 10 given the operators: addition, subtraction, multiplication, division, floor, ceil, atomic factorials, exponentiation, and absolute value; allowing rearrangement. As a percentage, there was a 98.15% success rate.

Conclusion and Reflection

Although I was unable to cover all 10000 numbers, I am happy with the results. Furthermore, I suspect that the Sydney trains numbers don't actually start from 0000 but from 1000 instead (I may be wrong about this). If I'm right about my previous statement, then there were only 121 failures, leading to a success rate of ~98.66%. That list contains theses numbers:

In regards to what I have learnt from this problem, I have learnt to focus more on reducing a problem, in how I only needed to test the failures of the previous experiments.

I have yet to find any particular pattern in the failures that would allow me to recognise a number as unsolvable, if any readers spot anything, I would appreciate you passing it onto me. As for the next steps towards 100%, I would like to see factorials added into the calculation on all elements, not just the initial digits.

I hope I have provided an insight into the Sydney Train Carriage Problem and would love to see any further work on the problem, especially a formal proof of whether or not all of the 10000 numbers are solvable. I imagine if not all number 10000 numbers are solvable then then proof might take the form of a contradiction based around either proving 0000 or 1000 are unsolvable.

Explained Simply - P vs NP — A simplified explanation of the P vs NP problem
Competitive Programming Tips — Practical tips and tricks for competitive programming in Python
Chasing Python's Recursion Limit — Exploring the limits of recursion in CPython

Matthew Egan

Your Agents Need You to Be a Product Engineer

The cost of a blocked agent

Fix the blockage

Outcomes, not tickets

Build the taste you're missing

Related Posts

Racing against time at DiviPay

Payment Cards

Security and Correctness

Performance

Setting The Bar

Finding The Low-hanging Fruit

Sentry Performance

Results

Related Posts

Competitive Programming Tips

Strategy

Finding Attributes and Methods

Lambda Functions

Common Sub-problems

Filtering collections

Altering collections

Combining two collections

Quickly finding the total of a collection of numbers

Quickly finding the maximum of a collection of numbers

Quickly finding the minimum of a collection of numbers

Maintaining a unique set

Counters

Reading files

Writing to files

More Information

Related Posts

Load testing at a glance

What is load testing?

Stress Testing

Soak Testing

Spike Testing

Why do we load test?

How can I load test?

What metrics do I care about?

How can I action these metrics?

Going further

Related Posts

Python packages that you may not have heard of

Glom

Crayons

Hypothesis

Marshmallow

Pystache

Related Posts

Describing Descriptors - A follow up

How do you delete the actual descriptor instance from a class?

Can you chain descriptors?

Is there a performance penalty for using descriptors?

Can you have inter-attribute validation?

How do descriptors work with slots?

Related Posts

Describing Descriptors

Related Posts

Getting started with Python testing

Our Plan

The Problem Space

Initial Requirements

Our First Test

Our Second Test

Running Our Tests

Debugging fibonacci

Well Done!

Related Posts

Explained Simply - P vs NP

What is it?

P? NP?

P

NP

So what's with the whole P vs NP thing?

Related Posts

Chasing Pythons Recursion Limit

Final Thoughts

Related Posts

Debugging `fibonacci`