Updated 2014-03-20 02:14:54 by AMG
dict for {keyVar valueVar} dictionaryValue body

This command (part of dict) takes three arguments, the first a two-element list of variable names (for the key and value respectively of each mapping in the dictionary), the second the dictionary value to iterate across, and the third a script to be evaluated for each mapping with the key and value variables set appropriately (in the manner of foreach.) The result of the command is an empty string. If any evaluation of the body generates a TCL_BREAK result, no further pairs from the dictionary will be iterated over and the dict for command will terminate successfully immediately. If any evaluation of the body generates a TCL_CONTINUE result, this shall be treated exactly like a normal TCL_OK result. The order of iteration is the order in which the keys were inserted into the dictionary.
# Data for one employee
dict set employeeInfo 12345-A forenames "Joe"
dict set employeeInfo 12345-A surname   "Schmoe"
dict set employeeInfo 12345-A street "147 Short Street"
dict set employeeInfo 12345-A city   "Springfield"
dict set employeeInfo 12345-A phone  "555-1234"
# Data for another employee
dict set employeeInfo 98372-J forenames "Anne"
dict set employeeInfo 98372-J surname   "Other"
dict set employeeInfo 98372-J street "32995 Oakdale Way"
dict set employeeInfo 98372-J city   "Springfield"
dict set employeeInfo 98372-J phone  "555-8765"
# The above data probably ought to come from a database...

# Print out some employee info
set i 0
puts "There are [dict size $employeeInfo] employees"
dict for {id info} $employeeInfo {
    puts "Employee #[incr i]: $id"
    dict with info {
        puts "   Name: $forenames $surname"
        puts "   Address: $street, $city"
        puts "   Telephone: $phone"
    }
}
# Another way to iterate and pick out names...
foreach id [dict keys $employeeInfo] {
    puts "Hello, [dict get $employeeInfo $id forenames]!"
}

AMG: [dict for] and [foreach] behave a little differently when applied to a key/value list with duplicate keys:
% set mydata {a 1 a 2 b 3 b 4}
a 1 a 2 b 3 b 4
% foreach {k v} $mydata {puts $k=$v}
a=1
a=2
b=3
b=4
% dict for {k v} $mydata {puts $k=$v}
a=2
b=4

[dict for] converts the list to a dict. In the process it discards all but the last instance of each key. This doesn't mean that data is gone for good; it's still present in the string representation, so it'll reappear when the data is later used as a list or string. Amazingly, this is the case even for pure lists (lists with no string representation): the conversion to dict forces the generation of a string representation when there are duplicate keys.
DKF: That's the only way to make the semantics actually function as documented. When there are no duplicate keys, lists can go directly to dicts without losing information, but when there are duplicates, the only way to stop the Tcl value (conceptually) from mutating is to generate the string representation.

PYK: This difference did not exist in 8.5.7 but does exist in 8.6.1. Which release introduced it? I've made use of the foreach behaviour and wondered if dict for shouldn't also behave the same way. If a dict forall existed, would it be more performant than foreach in this scenario, or is foreach already sufficiently optimized for this?

AMG: [dict for] and [foreach] also disagree in their handling of odd-length lists. [dict for] will give the error "missing value to go with key", whereas [foreach] will behave as if the list had an empty string element appended to it.

RLE: (2014-03-19) Given that both are behaving as documented, and the definition of a "dict" (no repeating keys), requires the dict for result, I fail to see the problem. Foreach is defined for lists, dict for is defined for dict's. The "mydata" var above is both a valid list and an "almost" valid dict (I say "almost" because of the repeating keys, which will be squashed to only the last key when converting to a dict) and so the results differ because the meaning of "mydata" interpreted as a list is indeed different from "mydata" interpreted as a dict.

But converting dict for to act like foreach when fed a list of pairs would be counter-intuitive. One would get duplicate keys out when list's with dupes. are fed in, but no duplicates out when the dict version of the same list was fed in. That seems wrong.

Think of "dict for {key value} $dict {}" as operating like this when fed a list:
  dict for {key value} [ dict create {*}$mydata ] { }

Which, behind the scenes, given Tcl's auto type conversions, is probably exactly what happens when dict for is fed a list Tcl_Obj instead of an actual dict Tcl_Obj.

PYK 2014-03-19: The current behaviour of foreach is an improvement. In 8.5.7, it behaved exactly as dict for, which was unexpected. The interesting thing now is that a dict behaves like a specialized list: Its values are ordered and it doesn't lose redundant key-value pairs, making the conversion in either direction lossless, even betwen pure values. Given these semantics, one begins to wonder how much of a performance penalty there is when lappend or foreach cause a dict to shimmer to a list. I'm guessing not much.

AMG: The intrep of a dict value is a hash table mapping from keys to values. The mapped-to structure of the hash table contains not only Tcl_Obj pointers but also doubly-linked list pointers to the previous and next entries in the dict. This is done to maintain ordering when shimmering to list. Consequently, constructing a list from a pure, canonical dict isn't very expensive.

Non-pure and non-canonical dict Tcl_Objs contain string representations from which the list is constructed. This is what makes it possible to have repeated keys which are visible in the string and list representations but not the dict representation.

Making a dict from a list involves constructing the complete hash table, plus threading together all the objects into a linked list. There is some cost there. As is typical of the Tcl hash table, many memory allocations are required.

The most expensive thing is repeated conversion between list and dict. Remember that this conversion happens even when merely reading the value as a list or as a dict. Since neither of the types in question is string, no attempt to modify is necessary to change the internal type.

To mitigate the performance penalty described above, [foreach] contains an optimization to read the dict's linked list pointers rather than force conversion to list. However, this won't work if the dict isn't pure (has a string representation) because it might have duplicate keys which would not be present in the hash table.

A number of years ago I proposed unifying the list and dict intreps to avoid shimmering. However, the generic Tcl hash table facility isn't up to the job, and it can't be amended because its internal structure is present in tcl.h, so the required changes break binary compatibility. The solution would be to make a new, more flexible hash table system with the existing one either coexisting or becoming a compatibility interface. Unfortunately, my attempt to discuss this on the Tcl mailing list instantly devolved into a debate on the relative merits of hashing algorithms, and my proposal was forgotten despite the reference code I had produced. I discuss this unification again in my Brush paper, if you would like a fresh look at it. But the algorithms presented in the paper have the drawback of linear-time dict deletes. Since then I have come up with better algorithms, but I haven't published them anywhere, and I'm afraid I may have forgotten some key points.