Writing secure SQL – library style

When I first started web development professionally, queries were written in-line and DBA’s complained about developer SQL performance. This was at the end of an age. Injection attacks were already hitting some of the large websites, and it became clear that Parameterization was the simplest way to counter it. Answers to the question of how best to check and enforce parameterization became relevant. PDO and MySQLi libraries were extremely relevant for PHP, and most of the large open source, and even paid platforms, had a hard time adjusting to the new approach. These days, the answers are pretty cut and dried, but, what if they are wrong? Here are a few approaches to parameterization – and an introduction of my personal favorite for microservices.

Query Builder

At the end of 2011 I started a new job. I also found out about a new thing. A query builder. It essentially had each operation from MSSQL mapped into a PHP object – and – you could build a query by referencing one of each of the major pieces of a command. Sub-selection was particularly interesting. To go along with the query builder was a massive quite of tests to make sure all variations of the builder worked properly, and, a large instructional document on how exactly to use the builder. The one I worked on was quite proprietary, but, the link above is somewhat close – although not as refined approach.

Pros

The SQL builder was actually pretty decent at parameterizing queries. The queries themselves weren’t too hard to read. After some tweaking – it was also fairly straightforward to dump out the ‘built’ query, including the params. Well, I thought so anyhow…

Cons

The learning curve was steep. A new developer was learning SQL and QueryBuilder at the same time. And even veterans preferred to call a raw string because they couldn’t remember where the document was, or, the were under a time crunch and they just needed it to work. Troubleshooting them was also a pain, having ‘comments’ in queries helped trace things when a DBA asked about the query, but, if the logger callbacks weren’t set quite right, the stack trace would end in the middle of the object builder.

Object-Relational Mapping (ORM)

The ORM is largely regarded by many developers I’ve met as the ultimate and final solution to talking to the database. The ORM model really abstracts away from the query. Now actions are objects, and parameters are obvious and easily manipulated. There’s documentation on how to use, naturally, and there’s a raw option in case something isn’t supported. You never know, pivots are weird! They are the all encompassing framework model building approach.

It’s object oriented! You don’t have to worry about the query syntax! You can even use it to enforce schema!

~Various Developers

Pros

The ORM really does a great job of covering parameterization. In fact, they often have a ‘query builder’ built into them, as it is a necessary piece of understanding what the ORM is doing. They’ve also got a lot of industry backing. Major players in the PHP space have really built them out. On top of this, they are pretty solid and don’t change much. They can also provide developer understanding of the database. Or, they can cloud it. The queries generated will always be consistent. That’s a good thing – until it isn’t.

Cons

ORMs are complicated. The concept is simple enough – even the documentation. That said. The DBA, or DB performance monitor just complained about a query – something that came out of your code base. Good luck identifying it quickly. Figuring out how to get a performant query can also sometimes be annoying. Forget migrating queries. That just doesn’t work.

ORMs are heavy. Building models and defining the entire database in code is a duplication of the DB itself. It’s a lot of putz work, and even if auto-generated, it’s still a significant effort to maintain an ORM.

ORMs still have a raw option. For all their splendor, ORMs still have the same flaw as a straight-up in-line sql query. raw. Any developer can potential get one of these going, and, voila, injection attack vector alert.

Query Validator

I’ve always had a desire for clean, intuitive, simple code. I’ve found that ORM’s and Builders are clean, but they don’t typically fit into simple, or even intuitive code. For that reason – I decided it was time to provide something that answers all three, and still remains true to security. I started off creating a query builder. Then realized it wouldn’t provide what I wanted. I scrapped that. In its place, I create a query validator. It validates a basic query, checking for parameters, and identifying if there should be parameters – it will fail without. I also added an Object Oriented approach for ‘sub’ queries. Recursive validation and what not.

Parameters are cool, in queries they’re sweet.

ORM’s bite – this new library is a treat!

Pros

Legible queries are easily written and read – I don’t need to build a huge empire of code to contact the database. I can just talk to the Database, even using specific query options of different databases. most of all – raw IS almost entirely the string written! With the validation piece too, it mitigates the risk of someone entering SQL strings without proper parameters.

Cons

I do kind of miss the data model in the code base. I also am concerned about query duplication in various code places. That is mitigated by a microservice architecture – but the potential still exists. This approach is also something that a large development team might eschew, as it just isn’t industry standard.

Summary

I think there are appropriate tools for appropriate scenarios. I didn’t really cover in-line queries – because they seem to fit the bill when the others fail. To that end, I’ve prepared something of a table as to where this could translate in real-world applications. Naturally, I am a bit biased.

MonolithSOAMicroserviceLambdaDevSpeedPerformanceSecurityTroubleshootingMaintainabilityPopularity
Query Builder
ORM
Query Validator
In-line

Feel free to comment!

Top 5 Automated Testing mistakes: A tale of Dev Confidence

Confidence Scenarios

Scenario 1: Lacking Automated Testing leads to developer uncertainty

Software Developer Anna is new. It’s her second week and she’s finally gotten everything up an running. HR is no longer making her jump through all of the hoops. Her dev environment is all setup – she’s completed an initial change to the code base – and her first, somewhat, solo change is complete. She sends the Pull Request off for code review – but suddenly notices – she’s gotten an alert about some test failure. Her intro dev comes over to her, and after a quick discussion – she is told how to run the test locally so she can fix everything. Her mentor is encouraging – complimenting her on her quick work to that point. An excellent example of Automated Testing gone right. But not all of us are so lucky.

Meet Dan, he’s pretty much in the same boat as Anna. However, his unwittingly broken code made it through the test suite, if you could call it that. It’s also made it through code review and has just been deployed. Unbeknownst to him, as he sits there feeling accomplished – the website he now develops has just gone down. About 15 minutes pass when he starts to notice a bunch of chatter at the Architect’s desk. Curios, he investigates. As he walks over – the architect exclaims, “Got it, it’s Dan’s check-in!” Dan continues to observe as the architect backs out his change. Anxiety continues to rise as he realizes just how much of the website is down. After pensively refreshing the browser on his phone for what seemed the thousands time, Dan sees a bunch of high fives as the Architect exclaims, “Alright, it should be all up! Dan – let’s go talk about this change of yours.”

Sufficient automated test coverage is key to building developer confidence.

Sadly, the second scenario is all too common.  Instead of spreading confidence around – the development environment awards a lot of it to emergency response heroes. As for everyone else, the only confidence they have – is that their changes are going to break something until they too can become an emergency response hero. Sufficient automated test coverage is key to building developer confidence.

Scenario 2: Contentious Test environment

“Mike and Charles getting into it again?” John asks Elaine.
“Ya, something about testing private methods with reflection”, Elaine replies, quickly adding, “This is why I don’t like writing Unit Tests here.”
“Oh?”
“Every time I try to write one, I get hit up in a code review that I’m doing it wrong.”, she quipped, “The senior devs can’t make up their minds. I know how to write tests for my code – but it’s not how they feel it should be”.
“I’ll keep that in mind – I’m going to talk to our manager about this.” John says as he starts to walk away.

Writing test code is not the same as writing production code. Yet, for those who write unit tests – it’s the same developer mindset between the two. Often developers are highly opinionated and unwillingly to move their opinions. But unit testing doesn’t have to be that hard. A well communicated standard around unit tests is required. But, it needs to be less restrictive then regular code. Causing devs to doubt the addition of a unit test leads to scenario 1. Build confidence by checking that devs are testing the code they wrote – but don’t care so much how they did it. The biggest exception to this is if testing performance suffers.

Scenario 3: Code reuse is not a friend

It’s the end of the day and Ed is adding some new functionality. He’s finished his code work – tested it locally quite thoroughly, and even built an automated test for it. He commits his code and waits for feedback. It doesn’t take long – 30% of the tests fail, but his passed before that happened. Ed start feeling quite discouraged. He did everything right, but now he has to spend the next half day working through the test failures. After this amount of time elapses, Ed realized his override of the base class persisted on a static variable. He cleans up his change and everything passes. Had his test been entirely encapsulated – a lot of the discouraging troubleshooting would have been avoided.

A great developer knows when Worse is Better

A good developer is familiar with DRY and Code Reuse. A great developer knows when Worse is Better. For PHPUnit testing, I’ve seen many practices of creating helper methods – and extension abstraction to create a ‘base’ test class. In all of these instances – the piece built to help cause frail tests. Doing things the right way for development is not the same as doing things the right way for testing. Some examples of ood reuse are mock objects. Some examples of bad code reuse are helper methods that perform actions – and – class extensions of PHPUnit or CodeCeption classes.

Scenario 4: Missing the point

Alex is looking through George’s code review. The functionality of the code looks great – there are even some well built unit tests – they cover every method. Alex can see that the test harness already ran and passed and goes on to approve the code review. Time passes. Jenny is modifying the code that George wrote. She doesn’t know much about the code, but she’s used to trusting the Automated Test suites to keep her safe. She makes some changes, checks them and after they pass and make it through code review – she remarks to herself how glad she is the Tests are in place. Once again, a website goes down and her confidence is shaken. What Alex and George failed to notice, is that even though all methods were built out as ‘tests’, and there was 100% code coverage, the tests were only checking that a return occurred. They didn’t care so much about the content.

This is a simplified scenario, but it is quite frequent. Often, devs will focus on the quantity of the test, instead of quality. Or they will conflate quality with ‘live’ code standards as in Scenario 2. A false sense of security from poorly built tests that always pass is missing the point. The point is to fail fast and fail often. Keep things that aren’t fully understood from going out the door. Prevent unintended consequences. The point of the test is there is an input and an output with something in between. Code coverage is great and all – but outcome coverage is better.

Scenario 5: False Positives

“Hey Lacey, why so down?”
“Oh, hi there Steve – looks like the pipeline is blocking me again.”
“Again? Really? What is it this time, Martians?” Lacey slightly smiled at that – Steve was always trying to cheer people up.
“No”, she replied, “It’s the functional tests. They take forever to run, then they fail for non-issues. I don’t know if I can even trust them, they fail so often on false positives.”
“Yeah, I know what you mean.”

Lacey and Steve are running into the problem of false positive fail* cases. Often these are environmental. They largely stem from ‘catch all’ style testing. Instead of building out tests at the functional, then the integration, then the GUI level tests** – someone has decided an end to end test is sufficient coverage. While End to End tests can be a great catch all for problems. They are expensive to build, more expensive to maintain, and downright frustrating to troubleshoot. They lead once again to Emergency Response heroes.

*(Scenario 4 also covers false positives but in regards to passing when they are supposed to fail)
** (see the Testing Pyramid)

Confidence Unleashed

Testing is an incredibly powerful weapon in a developer’s arsenal. Most recognized for keeping ‘live’ code safe from bugs. It’s hidden Easter egg feature is developer confidence. Building a great unit test suite doesn’t drive business value directly. But, I suspect – given a study of Dev workspaces – those with good testing environments lead to improved developer confidence. This in turns leads to increase productivity. Less bugs, improved productivity? A development team would be wise to implement good testing practices.