Template:Str len/core/doc

< Template:Str len‎ | core
Revision as of 11:57, 21 March 2013 by Unknown user (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Template:Mbox

This is the {{str len/core}} sub-template.

Do not use this template directly, use {{str len}} instead.

This template is called from {{str len}}, see user documentation there.

The rest of the sections here are only for those that want to understand the internal workings of this template.

Technical details

MediaWiki has no parser function or magic word to measure string lengths. And measuring string length using template code is very heavy on the servers. Thus this template is as optimised as possible.

Based on code for determining whether the length of a string is greater than a given value, the template searches for the actual string length. This sub-template holds most of the search function for {{str len}}. This sub-template can search from 000 to 499. The reason for the length limitation is that the magic word {{padleft:|}} only can pad up to 500 characters.

The basic comparison function used here is this:

{{#ifeq: xABCD | x{{padleft:ABCD| 9 }}
| Equal or longer.
| Shorter.
}}

Which means: "Is the string ABCD >= 9 characters long?"

Switch-cases and ifeq are too smart, they understand numbers. So they think that "0" and "00" are equal. That is why this code adds an "x" in front of the parameters, to make them into strings like "x0123" so the string comparisons work properly.

About ((str len))

Some of these operations are whitespace sensitive, so we must strip away any whitespace around the string parameter. That's why the string parameter is surrounded by {{#if:x| {{{1|}}} }} in some places.

{{str len}} first checks if the string is 500 characters or longer, since the search function in /core would otherwise return 499 for any string of 500 characters or more. (The older version of /core returned 999.) This also means that checking the really long strings is fairly efficient, it only takes one padleft operation and no parsing of the /core sub-template.

The three calls to /core is actually a loop, or recursion if you will.

The parameters to /core are whitespace sensitive. That is why we indent the parameters to /core so strangely. See more about this in the next section. If you change any of the indentation in {{str len}} you are likely to break this template. We could of course add a lot of whitespace stripping in /core, but that would cause more code in /core and would be inefficient.

About ((str len/core))

The code in this template is heavily optimised in several ways:

  • If we only go after what the NewPP limit reports tell, then it might seem we could optimise further by splitting this /core template up into three different /core templates, one for each search round. Since then we don't need the switch-case, and that gives lower values in the NewPP limit reports. However we know it is very costly for the Wikipedia servers to fetch more sub-templates, so we are pretty sure that would be much more costly.
  • MediaWiki developers have told us that calling padleft is costly, thus this template uses optimised search trees so it makes as few calls to padleft as possible. See next section for more on the search trees.

Note: The x's used in the explanation from here on just mean "any value from 0 to 9".

This template first searches 0xx to 4xx and then returns 0 to 4, as in 000 to 400. Then it searches x0x to x9x and returns x0 to x9, as in x00 to x90. Then it searches xx0 to xx9 and returns xx0 to xx9.

This template takes three unnamed (numbered) parameters:

1: The string that we want to know the length of.

2: The number returned by the previous rounds of searching. Thus at first search this parameter should be empty, at second search this parameter should hold a value between 0 and 4, and at third search it should hold a value between 00 and 49.

3: A word that tells which search should be done. For the first search this parameter should be "hundreds" so it searches from 0xx to 4xx. At second search it should be "tens", so it searches x00 to x90. And at third search it should be "ones", so it searches xx0 to xx9.

The parameters to this template are whitespace sensitive: Parameter 1 must not have any whitespace before it, and parameter 2 must not have any whitespace after it.

Search trees used

The search trees used in the loops are optimised based on the assumption that shorter strings are much more common. Note that finding 0 requires to do the "string >= 1" comparison, thus takes at least one operation. And finding for instance 1 means also checking the "string >= 2" comparison, thus takes at least two operations.

The numbers shown in the trees below are not the value searched for, but the value compared with. Thus --4-- means the comparison "string >= 4".

0xx-4xx, using linear search since most strings will probably be less than 100 bytes:

1--2--3--4

0 1 2 3 4 = Values to find.
1+2+3+4+4 = 14 comparisons to find all values once.

x0x-x9x, using linear search for 0x-3x, binary search for 4x-9x, since most strings will probably be less than 40 bytes:

1--2--3--4--6--8--9
            |  |
            5  7

0 1 2 3 4 5 6 7 8 9 = Values to find.
1+2+3+4+6+6+7+7+7+7 = 50 comparisons to find all values once.

xx0-xx9, using binary search:

4-----6--8--9
|     |  |
2--3  5  7
|
1

0 1 2 3 4 5 6 7 8 9 = Values to find.
3+3+3+3+3+3+4+4+4+4 = 34 comparisons to find all values once.

For comparison, here is the data for a full linear search:

1--2--3--4--5--6--7--8--9

0 1 2 3 4 5 6 7 8 9 = Values to find.
1+2+3+4+5+6+7+8+9+9 = 54 comparisons to find all values once.

See also