3% of 666 Python codebases we checked had silently failing unit tests (and we fixed them all, including PyTorch, Salt, Python.org)

Lets coin a name for a very special type of unit test:

Schrödinger’s unit-test: a unit test that passes but fails to test the thing we want to test.

This test will not fail

This article focuses on our code scanning of 666 Python codebases to detect one such Schrödinger’s unit tests. Specifically, this gotcha (which we found in 20 codebases and later pushed an auto-generated fix for):

Do you see the mistake? It’s a relatively common mistake for a developer to make. assertEqual and assertTrue were muddled up. In fact of the 666 codebases we checked, we found 3% of them mixed the two up. This is a real problem that affects real codebases, even active large ones with a huge community and strong testing and code review practices:

Python.org (ironically enough)
Celery
UK Ministry of Justice
Ansible
Django CMS
Keras
PyTorch
Ray
Salt

See the full lists in this gist. The fact the problem is so widespread suggests a key cause of this mistake is because it’s so easy to miss during code review. A quick glance at the code does not raise any red flags. It “Looks Good To Me!”.

Schrödinger’s unit-test

In the late 19th century Mark Twain apparently said It’s not what you know that gets you. it’s what you think you know but ain’t. What was true in late 19th century literature is also true in early 21st Python unit testing: tests that don’t really test the thing we want to test is worse than no tests at all because a Schrödinger’s test gives unwarranted confidence that the feature is safe.

Have you ever encountered this workflow:

  • Code change committed, and the tests for the feature passes locally.
  • The test ran in CI during the pull request build, and no failures reported.
  • The change was peer reviewed and accepted during code review. (“LGTM!”)
  • The change was then deployed to prod… only for it to fail upon first encounter with the user
  • After reading the bug report proclaim to yourself “but the test passed!”

Upon revisiting the unit test it’s then noted that the unit test that gave such great confidence during the software development life cycle was in fact not testing the feature at all. If you encountered this problem then you encountered this class of unit tests which this specific TestCase.assertTrue is one of.

The TestCase.assertTrue gotcha

While pytest is very popular, 28% of codebases still use the built-in unittest package. The codebases that do use the built-in unittest package are at risk of the assertTrue gotcha:

The docs for assertTrue state that test that expr is True, but this is an incomplete explanation. Really it tests that the expr is truthy, meaning if any of the following values are used in as the first argument the test will pass:

'foo', ‘bar’, 1, [1], {‘a’}, True

And conversely with falsey values the test will fail:

'', 0, [], {}, False,

assertTrue also accepts a second argument, which is the custom error message to show if the first argument is not truthy. This call signature allows the mistake to be made and the test to pass and therefore possibly fail silently.

For example when making 20 PRs fixing the problems we found one of the tests started to fail due to faulty application logic once the test was no longer a Schrödinger’s unit test:

We also found that at least two of the tests failed because the logic in the test was wrong. This is also bad if we consider unit tests to be a form of documentation describing how the product works. From a developer’s point of view, unit tests can be considered more trustworthy that comments and doc strings because comments and doc strings are claims written (perhaps long ago and perhaps now obsolete or incomplete), while a passing unit test shows a logic contract is being upheld…as long as the test is not a Schrödinger’s test!

Perhaps more data is needed, but this could mean 15% (3 in 20) of Schrödinger’s unit tests are hiding broken functionality.

TestCase.assertEqual

When the CodeReview.doctor github bot detects this mistake in a GitHub pull request the following solution is suggested:

You can securely scan your entire codebase online for free at CodeReview.doctor

Hacking Django websites: clickjacking

Part 2 of the “Hacking Django” series: part 1, part 3

Clickjacking is an attack where one of your logged-in user visits a malicious website, and that website tricks the user into interacting with your website via an iframe.

As an example, see the green “create pull request” button which will create a Pull Request on GitHub as the logged in user:

Let’s say the malicious website has some nefarious React:

import React from 'react';


export default function (props) {
  const [position, setPosition] = React.useState({ clientX: 0, clientY: 0 });
  const updatePosition = event => {
    const { pageX, pageY, clientX, clientY } = event;
    setPosition({ clientX, clientY,});
  };

  React.useEffect(() => {
    document.addEventListener("mousemove", updatePosition, false);
    document.addEventListener("mouseenter", updatePosition, false);
    return () => {
      document.removeEventListener("mousemove", updatePosition);
      document.removeEventListener("mouseenter", updatePosition);
    };
  }, []);

  return (
    <>
      <iframe
        style={{zIndex: 100, position: 'absolute', top: position.clientY-200, left: position.clientX-650}}
        src="<path-to-target-website>"
        width="750px"
        height="500px"
      />
      <div>Some website content</div>
    </>
  );
}

When one of the logged in users accesses the malicious website this happens: the iframe follows the mouse so that the button is on the mouse. When the user clicks, they click on the iframe.

Now that is very obvious – the user can see the iframe, but that’s one change away: style={{display: 'none', ...}}. For the sake of the demo I used style={{opacity: '0.1', ...}}(otherwise you would see nothing interesting):

Clickjack prevention middleware

The solution is simple: to set iframe embed policy for your website: adding django.middleware.clickjacking.XFrameOptionsMiddleware to settings MIDDLEWARE and X_FRAME_OPTIONS = 'SAMEORIGIN' will result in X-Frame-Options header with value of SAMEORIGIN, and modern browsers will then prevent your website from being embedded on other websites.

Continue reading: part 1, part 3

Does your website have security vulnerabilities?

Over time it’s easy for security vulnerabilities and tech debt to slip into your codebase. I can check clickjacking vulnerability and many others for for free you at https://django.doctor. I’m a Django code improvement bot:

If you would prefer security holes not make it into your codebase, I also review pull requests:

https://django.doctor

Alexa audio effects: change Alexa’s voice

Ever wished Alexa sounded like she was an old-timey radio? Sports stadium announcer Alexa? Alexa behind a door? Alexa Browser Client has you covered.

alexa

Run the demo then go to http://127.0.0.1:8000/mixer/

Why? Because it’s cool. But also to demonstrate the advantages of running Alexa in the browser: we have full control over the audio received from Alexa Voice Service. Adding audio manipulation takes not too many lines of code.

Old-timey radio Alexa

Sports stadium Alexa

When four equals five

Consider a form that allows users to search for holidays. It is expected behaviour that the form remembers previous searches –  so if a user previously searched for “4 guests”, the form should pre-populate with “4 guests” the next time the page loads.

This is a simple enough idea, but an odd bug was raised for this component. Ten points to Gryffindor for spotting it before knowing the symptoms: click to see the code.

You read the code code, yes? And you spotted the bug? No? Either way a summary:

1) The components are split into ‘container’ and ‘presentational’. Here is why.
2) createContainer returns a component instance that had state from a flux store passed to it as props.
3) If state.enquiryForm.numberOfGuests is falsey then default to 1.
4) We have some tests that confirm logic is as expected (1 for falsey, 5 for otherwise).

The unit tests passed, and we were happy when we manually tested the component in the browser — the select box ‘remembered’ the value. Great. Deploy!

So why was there a bug? The report was “the number of guests select box adds 1!”

After checking the component and the code we concluded ‘nope’ — The component did not add one to the value — when we selected “one guest”, on reload we got “one guest” in the form. When we selected “five guests” we got “five guests” on reload. When we select “four guests” in then we get “five guests” on reload. Wait, what?

Investigating the bug

Yes, the bug was reproducible . Great. Below is a table showing ‘the value we selected in the form’ against ‘the value pre-populated on refresh’:

Screen Shot 2016-05-22 at 22.40.53
Every even number adds one to itself!? Okay…
Go back to the code and see if you can see the bug. No?

Bitwise OR, Logical OR

There’s a big difference between | and ||. Due to a simple typo a bitwise OR was used in lieu of a logical OR. The outcome of the missing pipe was this unexpected behavior.

A bitwise OR performs a logical OR on each bit of in the numeric value — so five will be converted to 0101, and one will be converted to 0001, and each bit will be OR’d — if either of the bits are 1 then use 1 in the return value’s position:

1tqqmzdlgnqqyzq3mgpyfug1

But when we use an even number such as four or one, we get:

1il2o09ry7y7k7k7z7sir4g

and again with six or one:

1d9sttcmofl0y1ys2lc_ssq1

but to labour the point with seven and one:

1sbjyywur3hkh9ux-ka9n6a

Lessons learned

‘Write more unit tests’ is not the answer to this problem. After all, unit tests were written and the bug still got through. We could write tests specifically targeting ‘bitwise OR’ bugs, but this sets precedence against all similar scenarios. This would take an inordinate amount of time. Moreover, unit tests are for confirming logic, not detecting bugs. This type of bug can be more effectively detected using other means.

This bug was missed by three developers during code review. Does that indicate a human problem with the review process? I think not. Sometimes humans see what they expect to see. With this in mind, as put by W Edwards Deming: 100% inspection is only 100% effective if the inspector detects 100% of defects 100% of the time…which is impossible due to human nature. After all, if humans were that precise there wouldn’t be any defects made in the first place.

We can make ourselves more precise though by using static analysis tools like eslint. We turned on no bitwise-operators rule and it shows a helpful warning message. It should not interfere with the day to day tasks  of the project— if an e-commerce site sees itself manually wrangling bits then it’s probably doing something it shouldn’t.

Authenticate using Django Rest Framework and Angular

In a previous post we went through how to authenticated using a DRF endpoint.

This post will expand on how to achieve persistent login with a one page app using Angular. Note as there are 2 frameworks in play there will be some need to workaround integrations problems. It doesn’t get messy, which is a testiment to the quality of the respective frameworks!

For a demo check out this a silly microblog tool I made where you can create an account and log into it: github

Note my folder strucure is like so:

    project/
        urls.py
        settings.py
        static/
            js/
                app.js
                ...[all third party js files]...
        apps/
            core/
                views.py
                urls.py       
            accounts/
                urls.py
                api/
                    views.py
                    serializers.py
                    authentication.py

Create the Angular module

In app.js we create a module that will call the DRF endpoints in order to authenticate:

angular.module('authApp', ['ngResource']).
    config(['$httpProvider', function($httpProvider){
        // django and angular both support csrf tokens. This tells
        // angular which cookie to add to what header.
        $httpProvider.defaults.xsrfCookieName = 'csrftoken';
        $httpProvider.defaults.xsrfHeaderName = 'X-CSRFToken';
    }]).
    factory('api', function($resource){
        function add_auth_header(data, headersGetter){
            // as per HTTP authentication spec [1], credentials must be
            // encoded in base64. Lets use window.btoa [2]
            var headers = headersGetter();
            headers['Authorization'] = ('Basic ' + btoa(data.username +
                                        ':' + data.password));
        }
        // defining the endpoints. Note we escape url trailing dashes: Angular
        // strips unescaped trailing slashes. Problem as Django redirects urls
        // not ending in slashes to url that ends in slash for SEO reasons, unless
        // we tell Django not to [3]. This is a problem as the POST data cannot
        // be sent with the redirect. So we want Angular to not strip the slashes!
        return {
            auth: $resource('/api/auth\\/', {}, {
                login: {method: 'POST', transformRequest: add_auth_header},
                logout: {method: 'DELETE'}
            }),
            users: $resource('/api/users\\/', {}, {
                create: {method: 'POST'}
            })
        };
    }).
    controller('authController', function($scope, api) {
        // Angular does not detect auto-fill or auto-complete. If the browser
        // autofills "username", Angular will be unaware of this and think
        // the $scope.username is blank. To workaround this we use the
        // autofill-event polyfill [4][5]
        $('#id_auth_form input').checkAndTriggerAutoFillEvent();

        $scope.getCredentials = function(){
            return {username: $scope.username, password: $scope.password};
        };

        $scope.login = function(){
            api.auth.login($scope.getCredentials()).
                $promise.
                    then(function(data){
                        // on good username and password
                        $scope.user = data.username;
                    }).
                    catch(function(data){
                        // on incorrect username and password
                        alert(data.data.detail);
                    });
        };

        $scope.logout = function(){
            api.auth.logout(function(){
                $scope.user = undefined;
            });
        };
        $scope.register = function($event){
            // prevent login form from firing
            $event.preventDefault();
            // create user and immediatly login on success
            api.users.create($scope.getCredentials()).
                $promise.
                    then($scope.login).
                    catch(function(data){
                        alert(data.data.username);
                    });
            };
    });

// [1] https://tools.ietf.org/html/rfc2617
// [2] https://developer.mozilla.org/en-US/docs/Web/API/Window.btoa
// [3] https://docs.djangoproject.com/en/dev/ref/settings/#append-slash
// [4] https://github.com/tbosch/autofill-event
// [5] http://remysharp.com/2010/10/08/what-is-a-polyfill/

Create the Angular view

Django and Angular both use curly braces to denote variables. Below we want Django to take care of defining the user (‘{{user.username}}’) based on the value passed in the Context to the template, while Angular should take care of everything else. We use verbatim template tag to prevent Django from stealing Angular’s thunder. In one_page_app.html we do:

<html ng-cloak ng-app>
<head>
	<link rel="stylesheet" href="/static/js/angular/angular-csp.css"/>
	<link rel="stylesheet" href="/static/js/bootstrap/dist/css/bootstrap.min.css"/>
</head>
<body>
<div class="navbar navbar-fixed-bottom" ng-controller="authController"
ng-init="user='{{user.username}}'">
{% verbatim %}
<div ng-show="user" class="navbar-form navbar-right">
<input type="submit" class="btn btn-default" ng-click="logout()"
value="logout {{user}}"/>
</div>
<div ng-hide="user">
<form id="id_auth_form" class="navbar-form navbar-right"
ng-submit="login()">
<div class="form-group">
<input ng-model="username" required name="username"
type="text" placeholder="username" class="form-control">
</div>
<div class="form-group">
<input ng-model="password" required name="password" type="password"
placeholder="password" class="form-control">
</div>
<div class="btn-group">
<input type="submit" class="btn btn-default" value="login">
<input type="submit" class="btn btn-default" value="register"
ng-click="register($event)">
</div>
</form>
</div>
</div>
</body>
<script src="/static/js/jquery/jquery.js"></script>
<script src="/static/js/bootstrap/dist/js/bootstrap.min.js"></script>
<script src="/static/js/angular/angular.min.js"></script>
<script src="/static/js/angular-resource/angular-resource.min.js"></script>
<script src="/static/js/autofill-event/src/autofill-event.js"></script>
<script src="/static/js/app.js"></script>
</html>
{% endverbatim %}

Requirements

Bower is fantastic tool to manage front-end requirements. Lets define bower.json like so:

{
  "dependencies": {
    "angular": "1.3.0-build.2419+sha.fe0e434",
    "angular-resource": "1.3.0-build.2419+sha.fe0e434",
    "autofill-event": "1.0.0",
    "jquery": "1.8.2",
    "bootstrap": "3.0.0",
  }
}

DRF Endpoint

We also need to update the apps.accounts.api.AuthView created in previous post: we need to call django.contrib.auth.login in order to attach the session id the request object, and cause the creation a Set-Cookie header on the respons, which causes the browser to create the defined cookie and so we will achieve persistent login upon reloading the page:

from django.contrib.auth import login, logout

class AuthView(APIView):
    authentication_classes = (QuietBasicAuthentication,)

    def post(self, request, *args, **kwargs):
        login(request, request.user)
        return Response(UserSerializer(request.user).data)

    def delete(self, request, *args, **kwargs):
        logout(request)
        return Response({})

django.contrib.auth.logout stops the session, and sends instructions to browser to make the cookie expire.

Serving the view

We also of course need to serve the one_page_app.html, so in core.views.py we do

from django.views.generic.base import TemplateView

class OnePageAppView(TemplateView):
    template_name = 'one_page_app.html'

And in core.urls we route a url to the Django view:

from django.conf.urls import patterns, include, url

from . import views

urlpatterns = patterns('',
    url(r'^$', views.OnePageAppView.as_view(), name='home'),
)

With this all plugged in we have a one page app that:

– Shows login form if user isn’t authenticated with their Django session cookie.
– Shows logout if user is authenticated by either session cookie or username and password.
– Allows logging in in a RESTful way.
– Allows creating new user and automatically logs after creation.
– Alerts login and registration errors.
– Took only about 150 lines of code to achieve.