Lint tool to find non-i18n strings in a django template

Django provides out of the box support for internationalization, i.e. i18n, i.e. making your site appear in many lanauges. In order to internationlize django templates, you have to enclose all your text in {% trans ... %} or {% blocktrans %}...{% endblocktrans %}.

I had to prepare a Django site for translation, which meant going through all the templates and adding trans/blocktrans tags everywhere. This is time consuming, manual work. It's also likely that one could make mistakes and miss some text.

So I wrote a python script that will scan through a HTML Django template and print out all the places (line and column number) and text where there is text that should be wrapped in a trans/blocktrans tag.

Download

Download the python code. code is on github. Feel free to download, use and fork, any suggestions, email me, or a github pull request.

Examples of output

Input


Hello there

Output


example1.html:1:1:Hello there

Input


<p>This is a HTML text.</p><p>{% trans 'this is some text' %}</p>

Output


example2.html:1:4:This is a HTML text.

Input


{% blocktrans %}
<a href="/">Our site</a>
{% endblocktrans %}
<p>{{ some_var }}</p>
<img src="logo.png" alt="Our logo" />

Output


example3.html:5:20:alt="Our logo"

Notes

It uses regular expressions to try to parse this HTML/Django template. You shouldn't do this in general. It assumes you have a valid Django template, mismatched html or django tags will probably break it.

It prints out text outside HTML tags, and the alt, title, summary and for <input> tags, the value attribute.

It prints out the filename, the linenumber, character within that line, and the start of the text. One line for each piece of bad text. It might print out some false positives, such as numbers "0.00" etc.

How it works - Whitelisting with Regular Expression

The core of this programme is a large regular expression (below) that matches text and substrings that are OK and should be ignored. It matches text that is already translated (i.e. wrapped in blocktrans or trans tags), aswell as HTML tags, django template functions, javascript, etc. It then uses the technique to match text that doesn't match a regex to extract text that is not on this whitelist of valid text, and it then prints that out.


GOOD_STRINGS = re.compile(
    r"""
          # django comment
       ( {%\ comment\ %}.*?{%\ endcomment\ %}

         # already translated text
        |{%\ ?blocktrans.*?{%\ ?endblocktrans\ ?%}

         # any django template function (catches {% trans ..) aswell
        |{%.*?%}

         # JS 
        |<script.*?</script>

         # A html title or value attribute that's been translated
        |(?:value|title|summary|alt)="{%\ ?trans.*?%}"

         # A html title or value attribute that's just a template var
        |(?:value|title|summary|alt)="{{.*?}}"

         # An <option> value tag
        |<option[^<>]+?value="[^"]*?"

         # Any html attribute that's not value or title
        |[a-z:-]+?(?<!alt)(?<!value)(?<!title)(?<!summary)="[^"]*?"

         # HTML opening tag
        |<[\w:]+

         # End of a html opening tag
        |>
        |/>

         # closing html tag
        |</.*?>

         # any django template variable
        |{{.*?}}

         # HTML doctype
        |<!DOCTYPE.*?>

         # IE specific HTML
        |<!--\[if.*?<!\[endif\]-->

         # HTML comment
        |<!--.*?-->

         # HTML entities
        |&[a-z]{1,10};

         # CSS style
        |<style.*?</style>

         # another common template comment
        |{\#.*?\#}
        )""",

    # MULTILINE to match across lines and DOTALL to make . include the newline
    re.MULTILINE|re.DOTALL|re.VERBOSE)

Mentions

Post on django-i18n google group, and another copy of the post.
Django-users post, and another copy
Martin De Wulf's blog post about django i18n
Reddit discussions: on programming reddit, on python reddit, on django reddit.