10% of the 666 most popular Python GitHub repos have these f-string bugs

tl;dr

We found that 10% of the 666 most popular Python open source GitHub repositories had the following f-string typo bugs:

f prefix was missing: "hello {name}" instead of f"hello {name}" .
f prefix was written inside the string: "fhello {name}" instead of f"hello {name}" . We even saw an example of "hello f{name}" .
Concatenated strings used the f prefix on only the first string: f"hello {name}" + "do you like {food}?" instead of f"hello {name}" + f"do you like {food}? .

We fixed the problems we found in 69 of the most popular open source Python repositories — repositories you probably know and might even use. At the bottom of this page is a list of the GitHub pull requests we created but for the sake of summary some were Tensorflow, Celery, Ansible, DALI, Salt, Matplotlib, Numpy, Plotly, Pandas, Black, readthedocs.org, Scipy, PyTorch, rich, and some repositories from Microsoft, Robinhood, Mozilla, and AWS.

Why not follow me on Twitter to if you like this type of thing.

What is f-string interpolation?

Literal String Interpolation aka f-strings (because of the leading character preceding the string literal) were introduced by PEP 498. String interpolation is meant to be simpler with f-strings. “meant” is foreshadowing.

Prefix a string with the letter “ f ” and it’s now an f-string:

name = "jeff"
print(f"My name is {name}")

The mistakes/bugs/issues

We checked 666 most popular Python open source repositories on GitHub and found 10% of them had three types of f-string bugs, all of which broke the formatting leading to the value of the interpolated variable not being included in the string when evaluated at runtime.

Mistake 1: Missing f prefix

This was the most common case we found, and occurred mostly during logging and raising exceptions. This makes sense because logging and exception raising is a common use case for printing variable values into string messages. Take this example from pymc:

for b in batches:
    if not isinstance(b, Minibatch):
        raise TypeError("{b} is not a Minibatch")

Some developer debugging that TypeError is not going to have a great time as they will not immediately know what b equals due to the missing f prefix.

Mistake 2: accidentally including f inside the string

This is probably the hardest for a code reviewer to spot because if we give the code a cursory look before quickly typing “LGTM!” we can see there is an f at the start of the string. Take this example from Lean, would you spot it the f in the wrong place?

RateOfChangePercent('f{symbol}.DailyROCP({1})', 1)

Or this problem in Keras?

raise ValueError('f{method_name} `inputs` cannot be None or empty.')

Or this one from Buildbot

metrics.Timer("f{self.name}.reconfigServiceWithBuildbotConfig")

Many developers would not notice this during code review because this is a type of error humans are bad at spotting. Humans often see what they expect to see, not what is really there. That’s what makes static analysis checkers so important, especially ones that can also suggest the fix like so:

This highlights that for manual processes like code review to detect 100% of bugs 100% of the time then 100% of humans must perform 100% perfectly during code review 100% of the time. See the cognitive dissonance? We do code review because we expect human error when writing the code, but then expect no human error when reviewing the code. This dissonance results in bugs being committed.

Mistake 3: f-string concatenation

During concatenation only the first string is prefixed with f . In this example from Tensorflow the second string in the concatenation is missing the f prefix:

raise ValueError(
    f"Arguments critical_section_def={critical_section_def} "
    "and shared_name={shared_name} are mutually exclusive. "
    "Please only specify one of them."
)

This type of f-string bug is perhaps the most interesting because a pattern emerges when we look at all of the example of this issue:

The first string in the concatenation is correctly prefixed with f .
The second string is not prefixed with f.

We may be looking too deep into this but it seems like many developers think when string concatenation occurs it’s enough to declare the first string as an f-string and the other strings are turned into f-strings by osmosis. It doesn’t. We’re not suggesting this is the case for all developers that accidentally did this error, but interesting nonetheless.

The impact of f-string bugs

Based on the 666 repositories we checked, broken string interpolation due to typos most commonly occurs during logging, exception raising, and during unit testing. This therefore impacts confidence in the proper functioning of the code, and also impacts developer efficiencies when debugging.

f-string mistakes during logging

Bad string interpolation during logging means the information the code author intended to express will not actually be available to the developer trying to debug the code. For example see this code taken from Celery:

try:
    self.time_to_live_seconds = int(ttl)
except ValueError as e:
    logger.error('TTL must be a number; got "{ttl}"')

The developer will look in the logs and see got "{ttl}" and not see the value of ttl. It’s ironic that a developer will read the logs to debug something, only find another unrelated f-string bug that hinders them from fixing the original bug they were investigating.

Note also that Python docs suggest not using string formatting during logger calls. Instead use the logger’s build-in-mechanism to benefit lazy string interpolation for runtime optimisation.

f-string mistakes during exception raising

The impact is similar to the impact of bad f-string declaration during logging: it will reduce the effectiveness of a developer during debugging. Take for example this exception handling code from Microsoft CNTK:

raise ValueError(
    'invalid permutation element: elements must be from {-len(perm), ..., len(perm)-1}'
)

Or this example from Tensorflow:

raise ValueError(
    "{FLAGS.tfrecord2idx_script} does not lead to valid tfrecord2idx script."
)

Both of those should be f strings but are missing the f prefix.

The good news is crash reporting tools such as Sentry or Crashlyrics should show the variable value during the crash. Still makes debugging harder though, and not all projects use such tools.

f-string mistakes during unit testing

Tests give confidence that our code behaves how we want. However, sometimes it’s the tests themselves that don’t always behave how we want. Perhaps due to typos. In fact perhaps due to f-string typos. Take for example this test found in Home Assistant:

response = await client.get(
    "/api/nest/event_media/{device.id}/invalid-event-id"
)
assert response.status == HTTPStatus.NOT_FOUND

The test expects a 404 to be returned due to “invalid-event-id” being used in the URL but actually the cause of the 404 is for another reason: because the device.id value was not interpolated into the string due to a missing the f prefix. The test is not really testing what the developer intended to test.

There is some poor developer out there maintaining the this part of the codebases that is relying on this test and using it as a quality gate to give them the confidence they have not broken the product but the test is not really working. It’s this kind of thing that can harm a night’s sleep! It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.

Detecting the bugs

Code Review Doctor is a static analysis tool that detects common mistakes and offers the fix. When adding new checkers we stress test them by analysing open source code, and then check the results to minimise false positives.

To test our new “this probably should be an f string” checker we generated a list of the most popular Python repositories on GitHub by using GitHub’s topic search API. We ran the following code (some code is skipped for simplicity):

class Client:
    def get_popular_repositories(self, topic, count, per_page=100):
        urls = []
        q = f"topic:{topic}&order=desc&sort=stars"
        url = f'https://api.github.com/search/repositories?q=q'
        first_resp = self.session.get(
            url=url,
            params={"per_page": per_page}
        )
        for resp in self.paginate(first_resp):
            resp.raise_for_status()
            urls += [i['clone_url'] for i in resp.json()['items']]
            if len(urls) >= count:
                break
        return urls[:count]

    def paginate(self, response):
        while True:
            yield response
            if 'next' not in response.links:
                break
            response = self.session.get(
                url=response.links['next']['url'],
                headers=response.request.headers
            )

As you can see, GitHub uses the Link header for pagination. We then ran that code like:

client = Client()
clone_urls = client.get_popular_repositories(
    topic='python', count=666
)

Here’s the gist of the list of GitHub repository clone URLs that was generated.

Once we got the list of most popular Python repositories we ran them through our distributed static analysis checkers on AWS lambda. This serverless approach allows us to simultaneously test hundreds of repositories with perhaps millions of lines of code without having to worry about scalability of our back-end infrastructure, and we do not require our developers to have beefy development machines, and our servers don’t go down or experience performance degradation during peak load.

Minimising false positives

False positives are annoying for developers, so we attempt to minimise false positives. We were able to do this by writing a “version 1” of the checker and running it against a small cohort of open source repositories and check for false positives, then iterate by fixing the false positives and each time increase the number of repositories in the cohort. Eventually we get to “version n” and were happy the false positives are at an acceptable rate. Version 1 was very simple:

GIVEN a string does not have an f prefix
WHEN the string contains {foo}
AND foo is in scope
THEN it’s probably missing an f prefix

This very naive approach allowed us to run the checkers and see what cases we did not consider. There were a lot of them! Here are some false positives we found:

The string is later used in an str.format(…) call or str.format_map(…)
The string is used in a logger call. Python logger uses it’s own mechanism for string interpolation: the developer can pass in arbitrary keyword arguments that the logger will later interpolate:logger.debug("the value of {name} is {value}. a={a}.", name=name, value=value, a=a)"
The string is used in behave style test @when('{user} accesses {url}.')
The “interpolation” is actually this common pattern for inserting a value into a template some_string_template.replace('{name}', name) .

The false positive that was most difficult to solve was the string later being using with str.format(…) or str.format_map(…) because there is no guarantee that the str.format(…) call occurs in the same scope that the string was defined in: the string may be returned by function to be later used with str.format(…). Our technology currently does not have the capability to follow variables around every scope and know how it’s used. This will be an interesting problem to solve.

With these false positives in mind, our BDD-style spec ends up like:

GIVEN a string does not have an f prefix
AND the string is not used with an str.format call
AND the string is not used with an str.format_map call
AND the string is not using logger’s interpolation mechanism
AND the string is not used in @given @when @then
And the string is not used in .replace(‘{foo}’, bar)
WHEN the string contains {foo}
AND foo is in scope
THEN it’s probably missing an f prefix

That is vast simplification: there are many other ways a string can “look” like it should be an f-string but actually is not.

Reaction from the community

It was interesting to write the f-string checker, it was interesting to detect how common the problem was (10% is a lot higher than we expected!), and it was interesting to auto-create GitHub pull requests to fix the issues we found.

It was also interesting to see the reaction from open source developers to unsolicited pull requests from what looks like a bot (really a bot found the problem and made the PR, but really a human developer at Code Review Doctor did triage the issue before the PR was raised). After creating 69 pull requests the reaction ranged from:

Annoyance that a bot with no context on their codebase was raising pull requests. A few accepted the bugs were simple enough for a bot to fix and merged the pull request, but a few closed the pull requests and issues without fixing. Fair enough. Open source developers are busy people and are not being paid to interact with what they think it just a bot. We’re not entitled to their limited time.
Neutral silence. They just merged the PR.
Gratitude. “thanks” or “good bot”.
A developer even reached out and asked us to auto-fix their use of using == to compare True False and None, so we did. Read more about that here.

For science you can see all the reactions (both positive and negative) here.

Protecting your codebase

You can add Code Review Doctor to GitHub so fixes for problems like these are suggested right inside your PR or scan your entire codebase for free online.

Here’s a list of pull requests we created:

	Pull Request
	https://github.com/aio-libs/aiohttp/pull/6718
	https://github.com/allenai/allennlp/pull/5629
	https://github.com/ansible-community/molecule/pull/3521
	https://github.com/ansible/ansible/pull/77619
	https://github.com/apache/superset/pull/19826
	https://github.com/awsdocs/aws-doc-sdk-examples/pull/3110
	https://github.com/bokeh/bokeh/pull/12097
	https://github.com/buildbot/buildbot/pull/6492
	https://github.com/catboost/catboost/pull/2069
	https://github.com/celery/celery/pull/7481
	https://github.com/coqui-ai/TTS/pull/1532
	https://github.com/ctgk/PRML/pull/38
	https://github.com/cupy/cupy/pull/6674
	https://github.com/DLR-RM/stable-baselines3/pull/882
	https://github.com/donnemartin/gitsome/pull/194
	https://github.com/esphome/esphome/pull/3415
	https://github.com/facebook/pyre-check/pull/608
	https://github.com/feeluown/FeelUOwn/pull/580
	https://github.com/home-assistant/core/pull/70574
	https://github.com/huggingface/transformers/pull/16913
	https://github.com/intel-analytics/BigDL/pull/4478
	https://github.com/isl-org/Open3D/pull/5043
	https://github.com/jupyterhub/jupyterhub/pull/3874
	https://github.com/keon/algorithms/pull/864
	https://github.com/keras-team/keras/pull/16459
	https://github.com/kornia/kornia/pull/1690
	https://github.com/kovidgoyal/kitty/pull/5007
	https://github.com/matplotlib/matplotlib/pull/22883
	https://github.com/microsoft/computervision-recipes/pull/669
	https://github.com/microsoft/qlib/pull/1072
	https://github.com/modin-project/modin/pull/4415
	https://github.com/mozilla/TTS/pull/750
	https://github.com/networkx/networkx/pull/5574
	https://github.com/numpy/numpy/pull/21384
	https://github.com/NVIDIA/DALI/pull/3847
	https://github.com/OpenBB-finance/OpenBBTerminal/pull/1732
	https://github.com/OpenMined/PySyft/pull/6421
	https://github.com/openreplay/openreplay/pull/430
	https://github.com/PaddlePaddle/Paddle/pull/42164
	https://github.com/pandas-dev/pandas/pull/46855
	https://github.com/plotly/dash/pull/2024
	https://github.com/PrefectHQ/prefect/pull/5714
	https://github.com/psf/black/pull/3031
	https://github.com/pymc-devs/pymc/pull/5726
	https://github.com/pytorch/fairseq/pull/4380
	https://github.com/PyTorchLightning/pytorch-lightning/pull/12869
	https://github.com/Qiskit/qiskit-terra/pull/7982
	https://github.com/QuantConnect/Lean/pull/6301
	https://github.com/qutebrowser/qutebrowser/pull/7138
	https://github.com/rapidsai/cudf/pull/10721
	https://github.com/RaRe-Technologies/gensim/pull/3332
	https://github.com/rasbt/mlxtend/pull/917
	https://github.com/ray-project/ray/pull/24154
	https://github.com/readthedocs/readthedocs.org/pull/9136
	https://github.com/robinhood/faust/pull/755
	https://github.com/saltstack/salt/pull/61980
	https://github.com/scikit-learn/scikit-learn/pull/23206
	https://github.com/aaugustin/websockets/commit/9c87d43f1d7bbf6847350087aae74fd35f73a642
	https://github.com/scipy/scipy/pull/16037
	https://github.com/smicallef/spiderfoot/pull/1649
	https://github.com/spyder-ide/spyder/pull/17741
	https://github.com/statsmodels/statsmodels/pull/8245
	https://github.com/tensorflow/tensorflow/pull/55726
	https://github.com/Textualize/rich/pull/2216
	https://github.com/vyperlang/vyper/pull/2826
	https://github.com/yoshiko2/Movie_Data_Capture/pull/779
	https://github.com/yt-dlp/yt-dlp/pull/3539
	https://github.com/zulip/zulip/pull/21906

view raw probably-meant-f-string-pull-requests.csv hosted with ❤ by GitHub

3% of 666 Python codebases we checked had silently failing unit tests (and we fixed them all, including PyTorch, Salt, Python.org)

Lets coin a name for a very special type of unit test:

Schrödinger’s unit-test: a unit test that passes but fails to test the thing we want to test.

This article focuses on our code scanning of 666 Python codebases to detect one such Schrödinger’s unit tests. Specifically, this gotcha (which we found in 20 codebases and later pushed an auto-generated fix for):

Do you see the mistake? It’s a relatively common mistake for a developer to make. assertEqual and assertTrue were muddled up. In fact of the 666 codebases we checked, we found 3% of them mixed the two up. This is a real problem that affects real codebases, even active large ones with a huge community and strong testing and code review practices:

Python.org (ironically enough)
Celery
UK Ministry of Justice
Ansible
Django CMS
Keras
PyTorch
Ray
Salt

See the full lists in this gist. The fact the problem is so widespread suggests a key cause of this mistake is because it’s so easy to miss during code review. A quick glance at the code does not raise any red flags. It “Looks Good To Me!”.

Schrödinger’s unit-test

In the late 19th century Mark Twain apparently said It’s not what you know that gets you. it’s what you think you know but ain’t. What was true in late 19th century literature is also true in early 21st Python unit testing: tests that don’t really test the thing we want to test is worse than no tests at all because a Schrödinger’s test gives unwarranted confidence that the feature is safe.

Have you ever encountered this workflow:

Code change committed, and the tests for the feature passes locally.
The test ran in CI during the pull request build, and no failures reported.
The change was peer reviewed and accepted during code review. (“LGTM!”)
The change was then deployed to prod… only for it to fail upon first encounter with the user
After reading the bug report proclaim to yourself “but the test passed!”

Upon revisiting the unit test it’s then noted that the unit test that gave such great confidence during the software development life cycle was in fact not testing the feature at all. If you encountered this problem then you encountered this class of unit tests which this specific TestCase.assertTrue is one of.

The TestCase.assertTrue gotcha

While pytest is very popular, 28% of codebases still use the built-in unittest package. The codebases that do use the built-in unittest package are at risk of the assertTrue gotcha:

The docs for assertTrue state that test that expr is True, but this is an incomplete explanation. Really it tests that the expr is truthy, meaning if any of the following values are used in as the first argument the test will pass:

'foo', ‘bar’, 1, [1], {‘a’}, True

And conversely with falsey values the test will fail:

'', 0, [], {}, False,

assertTrue also accepts a second argument, which is the custom error message to show if the first argument is not truthy. This call signature allows the mistake to be made and the test to pass and therefore possibly fail silently.

For example when making 20 PRs fixing the problems we found one of the tests started to fail due to faulty application logic once the test was no longer a Schrödinger’s unit test:

We also found that at least two of the tests failed because the logic in the test was wrong. This is also bad if we consider unit tests to be a form of documentation describing how the product works. From a developer’s point of view, unit tests can be considered more trustworthy that comments and doc strings because comments and doc strings are claims written (perhaps long ago and perhaps now obsolete or incomplete), while a passing unit test shows a logic contract is being upheld…as long as the test is not a Schrödinger’s test!

Perhaps more data is needed, but this could mean 15% (3 in 20) of Schrödinger’s unit tests are hiding broken functionality.

TestCase.assertEqual

When the CodeReview.doctor github bot detects this mistake in a GitHub pull request the following solution is suggested:

You can securely scan your entire codebase online for free at CodeReview.doctor

Hacking Django websites: clickjacking

Part 2 of the “Hacking Django” series: part 1, part 3

Clickjacking is an attack where one of your logged-in user visits a malicious website, and that website tricks the user into interacting with your website via an iframe.

As an example, see the green “create pull request” button which will create a Pull Request on GitHub as the logged in user:

Let’s say the malicious website has some nefarious React:

import React from 'react';


export default function (props) {
  const [position, setPosition] = React.useState({ clientX: 0, clientY: 0 });
  const updatePosition = event => {
    const { pageX, pageY, clientX, clientY } = event;
    setPosition({ clientX, clientY,});
  };

  React.useEffect(() => {
    document.addEventListener("mousemove", updatePosition, false);
    document.addEventListener("mouseenter", updatePosition, false);
    return () => {
      document.removeEventListener("mousemove", updatePosition);
      document.removeEventListener("mouseenter", updatePosition);
    };
  }, []);

  return (
    <>
      <iframe
        style={{zIndex: 100, position: 'absolute', top: position.clientY-200, left: position.clientX-650}}
        src="<path-to-target-website>"
        width="750px"
        height="500px"
      />
      <div>Some website content</div>
    </>
  );
}

When one of the logged in users accesses the malicious website this happens: the iframe follows the mouse so that the button is on the mouse. When the user clicks, they click on the iframe.

Now that is very obvious – the user can see the iframe, but that’s one change away: style={{display: 'none', ...}}. For the sake of the demo I used style={{opacity: '0.1', ...}}(otherwise you would see nothing interesting):

Clickjack prevention middleware

The solution is simple: to set iframe embed policy for your website: adding django.middleware.clickjacking.XFrameOptionsMiddleware to settings MIDDLEWARE and X_FRAME_OPTIONS = 'SAMEORIGIN' will result in X-Frame-Options header with value of SAMEORIGIN, and modern browsers will then prevent your website from being embedded on other websites.

Continue reading: part 1, part 3

Does your website have security vulnerabilities?

Over time it’s easy for security vulnerabilities and tech debt to slip into your codebase. I can check clickjacking vulnerability and many others for for free you at https://django.doctor. I’m a Django code improvement bot:

If you would prefer security holes not make it into your codebase, I also review pull requests:

https://django.doctor

Selenium in a hurry

I love tools that save me time. Then I start using them. Then using them turns into a chore. A tool I love is a tool I haven’t used enough yet.

I started using Selenium for end to end testing my web apps, and Selenium went from awesome to chore quite quickly. My gripe was verbosity:

(browser.find_element_by_class_name('contact-menu')
    .find_element_by_class_name('show-all')).click()
browser.find_element_by_id('user-' + str(user_id)).click()
browser.find_element_by_id('write-mail-to-selected').click()
(browser.find_element_by_class_name('user-info-' + str(user_id)
    .find_element_by_class_name('icon-edit')).click()

Writing it took longer than I wanted, and it will take longer than it should to read, and future-me will be negatively affected when maintaining it. Over-verbosity is not my friend – I think a developer is not here to write code: I think I’m here to create features and I prefer doing it with less code where appropriate. Maybe jQuery has fooled me into thinking clicking on a DOM element can be as simple as:

$('#app-footer').click()

Sure, I appreciate the Selenium project attempts to maintain similar syntax across it’s many implementations. Pick up a Java, C, or Python implementation and if you can write in one implementation then you can read it in another. jQuery doesn’t have that issue as its only implemented in javascript. However, I’m not in the business of reading Selenium scripts across multiple languages.

I work with Python in a high pressure agile environment where I’m happy to make my Selenium scripts more Pythonic in order improve my workflow. I dont stand alone in this opinion: the official ElasticSearch library has been implemented across many different languages – and each one focuses on aligning with the language it was implemented on rather than attempting to maintain a similar syntax across several languages. I prefer this approach.

My wrapper makes selenium what I consider more developer-friendly, to give me sugar that allows:

browser.find('.contact-menu').find('.show-all').click()
browser.find('#user-', user_id).click()
browser.find('#write-mail-to-selected').click()
browser.find('#user-info-', user_id).find('.edit').click()

Compared to the vanilla Selenium its 40% less code and I think 40% more awesome. With that defense laid down, below we go through how it was implemented.

from selenium.webdriver.remote.webelement import WebElement
from selenium.webdriver.firefox.webdriver import WebDriver as Firefox
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile


class CustomFirefoxDriver(Firefox):
    """
    web driver that returns CustomWebElement
    when get_element_by_*, etc are called.

    """

    def __init__(self)
        super(CustomFirefoxDriver, self).__init__()
        self.find = ElementFinder(self)

    def create_web_element(self, element_id):
        return CustomWebElement(self, element_id)


class CustomWebElement(WebElement):
    """Attach ElementFinder instance to web elements."""

    def __init__(self, *args, **kwargs):
        self.find = ElementFinder(self)
        super(CustomWebElement, self).__init__(*args, **kwargs)


class ElementFinder(object):
    """
    non-verbosely find elements. routes to find_element_by_id,
    find_element_by_class_name, etc.

    """

    def __init__(self, element):
        self.element = element

    def __call__(self, many=False, *args, **kwargs):
        """
        if many is True then find_elements_by_*, else find_element_by_*

        note args will be concatenated to create the selector

        """

        selector = ''.join(map(str, args))
        prepend = 'find_element{0}_by_'.format('s' if many else '')
        # rough approximate check if selector is complex css selector, if not then
        # select element by id, class, or tag, and fall back to css_selector
        punctuation = """!"#$%&'()*+,./:;<=>?@[\]^`{|}~"""
        if sum(map(selector.count, punctuation)):
            append = 'css_selector'
        elif selector[0] == '#':
            append = 'id'
            selector = selector[1:]
        elif selector[0] == '.':
            append = 'class_name'
            selector = selector[1:]
        else:
            append = 'tag_name'
            selector = selector

        # now we know the method name to call, lets do it
        return getattr(self.element, prepend + append)(selector)

Richard Tier

Menu

Tag Archives: python