Recently I wrote about my attempt to learn how to look up Chinese characters. The crux of the process is not terribly complicated, but there are lots and lots and lots of exceptions. I posted a list of “questions for experts” that I couldn’t figure out. It occurred to me later that I personally know a sinologist, namely David Branner. (Branner is also a computer programmer; he describes himself as a “computational lexicographer of Chinese”.) I reached out to him and he graciously answered all my questions. Here are his remarks, verbatim and in full.

  1. Is every character component a radical? In other words, are there character components that are not radicals?

    A radical is a tool in a dictionary for looking up characters. The term is Western, not Chinese; the Chinese term is 部首, “section head.” The name radical embodies the misconception that “section heads” are the etymological roots of characters. In fact, they are often just present for semantic disambiguation of older loangraph usage. (More on this point later, under question 4.)

    Different dictionaries may use steeply diverging sets of radicals. The 214-element system is featured in the Kāngxī zìdiǎn of 1716. The Shuōwén dictionary, dating to about the year 100, uses a system of 540 that is semantically much more sensible — that is, if you already know hundreds and hundreds of characters. Anyone who knows that system well is probably very interesting to talk to. The Shuōwén system actually is a system, with a sort of philosophical order to it.

    A “component” is simply a salient element of a character. The name “element” is fine, too. A component may or may not serve as a radical in some dictionary or other. It may or may not have significance in the historical evolution of the graph. It may just be a component.

    A key point is this: yes, you can analyze a character into a bunch of different recognizable components that are also characters; but, no, that doesn’t mean those components are what the character is actually accreted from, etymologically. Sometimes custom, error, and the convenience of the pen have been involved.

  2. Is there a method, or even a set of rules of thumb, for identifying a character’s primary radical?

    There are heuristics, but it’s best to learn these as you go. Wanting to see the whole system at one go seems reasonable when you’re starting, but it’s hard to digest and gives a mistaken impression of philosophical consistency. [Here is] a link to a guide I once wrote to the system.

    Note: This guide basically answers most of my questions. In fact, I may not have written my initial post if I had found it.

  3. I made up the expression primary radical. What is the real name for it? What about the remainder?

    There’s no need for “primary” because by definition a character has only one radical: the “section-head” under which it’s placed in a particular dictionary. “Remainder,” meaning the parts of the character whose strokes have to be counted to find the whole character within the radical section of the dictionary, is fine, and people do use that expression.

  4. Given two primary radicals with the same number of strokes, is one of them “before” or “after” the other? In other words, is there an ordering between primary radicals with the same number of strokes?

    You generally learn this through practice. It’s not a coherent philosophical system. But there are some etymological principles in play. You learn these as you go. Some elements turn out to be much more common as radicals than others. Some are just a load of BS.

    The graph 秋 for qiū “autumn” appears to be made up of two possible radicals, 禾 and 火. Which is it placed under in the dictionary? Under 禾 “rice”, because 秋 etymologically seems to mean “harvest time.”

    The element 火 in this graph is a simplification — a pen-convenience. Historically, the full form of this character was 龝. The element on the right is guī 龜 “(land) tortoise”. 龜 is also a radical in the Kāngxī zìdiǎn, but a pretty rare one.

    Now, what do tortoises have to do with autumn? There may be an etymological connection (remember that much etymology is speculative). But probably it’s there for sound. That is, 龜 and 秋 were once close homophones, something not possible to know if you only know Mandarin.

    It’s even possible that at one time 龜, sounding very much like 秋, was actually used alone to write the word for “autumn”, leading to the likelihood of confusion with the separate word for “tortoise”. At some point, 禾 would have been added to “disambiguate” 龜 in its meaning “autumn” from 龜 in its meaning “tortoise”. So this 禾 in this disambiguating function is known as a “semantic component” or “semantophore,” and for that reason it is more likely to be used as the radical in a radical-based dictionary.

    Another example is 相, commonly read xiāng “mutually, one to another” and also read xiàng “to examine, to observe”, etc. etc. Both components, 木 and 目, are common radicals. Which is the one I should use to look up 相? Both are pronounced in Mandarin and something like muk in the earlier language; neither of those sounds much like xiang. So it’s hard to tell which element in 相 is phonetic and which, by elimination, is semantic.

    But here the key is semantics. 木 means tree or wood and 目 means eye. The meaning “eye” is closer to “examine”, I think you’d agree, so as a likely semantophore it’s a better candidate to be the classifer. And that leaves 木 as phonetic, then? Probably 木 is an orthographic variant of qiáng 爿, which would be a plausible phonetic. Here’s some discusson of this case.

    An example of a radical that is a load of BS is 土 as radical under which 垂 is placed. There’s no real relationship between the two, but 垂 has to go somewhere, and 土 sort of works. There are other cases like this shown toward the end of my radical guide.

  5. What is the relationship between 辶 and 辵? In general, what is the relationship between “alternate forms” of the same radical?

    In general, you’re stuck. Welcome to the Chinese writing system! It is one of the most rewarding things in the world to study.

    But, in particular, 辶 is a cursive form of 辵, which is also written 彳+龰(=止) in some graphs, such as 徒 (in terms of its historical structure, 辵+土) and 徙 (辵+止). 辵 will cost you seven strokes and 辶 somewhere between two and four, depending. If you were a lazy scribe, which would you rather write?

    Another example of a cursive form in standard kǎishū 楷書 is 之, whose more exact historical form is 㞢. 㞢 appears in some throwback graphs such as 旹 (日+㞢) for shí 時.

    There have been times when people really loved using perversely archaic forms like 旹. Because of the intense emotional relationship many Chinese speakers have with the writing system, some of these are readily available in computer input methods. One of the characters of my name, for instance, 德, is also written 恴/悳/惪/㥁/㤫; all are available on the Chromebook I’m writing this on. The Taiwan Ministry of Education maintains a website of alternate character forms at https://dict.variants.moe.edu.tw/variants/rbt/query_by_standard_tiles.rbt?command=clear. Take a look at some of those for 徒 or 達, for instance.

  6. What is the relationship between 馗 and 道? Why is 首 the primary radical in the former but not the latter?

    There is no relationship; they just happen to look similar.

    首 is probably phonetic in 道, so 辶, as apparent semantophore, is a better candidate for classifier. But possibly 九 is phonetic in 馗 (also written 逵), with 首 semantic (馗 means “head” among other things).

  7. What is the relationship between the different “fonts” of radicals and characters? (For example, sometimes 辶 has two strokes instead of one.)

    Now you’re talking! Welcome to the Chinese writing system!

    It can easily take you the rest of your life to study. It is by far the most elaborate writing system the human species has developed, and immensely fascinating to lose yourself in. It is not systematic. The pleasure of learning it is learning all of its variety and diversity, over your lifetime. If you’d prefer a much simpler orthography, I’m told Spanish and Russian offer that.

    Note: Learning most alphabets really isn’t all that hard. Even including things like upper- and lower-cases, various connecting forms, and regional or historical alternative forms, learning an alphabet generally requires learning on the order 26 characters, and perhaps as many as 28. With Chinese, on the other hand, there are so many exceptions that there doesn’t even appear to be a decision procedure for identifying whether a purported character is real or not.

  8. How can dictionaries using different fonts be reconciled, since different fonts might imply different stroke-counts?

    Stubbornness and experience. I don’t know. Is that a trick question? Welcome to the Chinese writing system!

  9. How many undecomposable orthographic elements are there?

    I think I once counted and found on the order of 1200. Welcome to the Chinese writing system!

  10. How many of the 214 primary radicals are decomposable?

    This question is left as a fun exercise for the student. Welcome to the Chinese writing system!

  11. How deep can radicals be nested?

    This question is left as a fun exercise for the student. Welcome to the Chinese writing system!

  12. How do Chinese dictionaries deal with Chinese-style characters from other countries? Do they?

    Some do. They place them under the most plausible classifier plus additional strokes. Most of the time, for Japanese and Korean examples, the assignments are uncontroversial. For Vietnamese, we could be in trouble, but I have no information.

    In the case of your graph 燵, by the way, I don’t recommend calling it kokuji 囯字, since the written form of the name means “national (i.e., Chinese) character” in Chinese. A better name is waji 和字, unambiguously “native Japanese graph.” This name (in written form) is acceptable in both Chinese and Japanese.

    Note: 囯 means “national”, but “national” only means “Chinese” in China, just like “I” means “Nick Drozd” when I say it, but not when anyone else says it. Unsurprisingly, “national” is understood to mean “Japanese” in Japan, “Korean” in Korea, etc. As far as I can tell (and let me reiterate that I am far from an expert), 囯字 kokuji is a standard expression in Japan. 燵 is apparently a Japanese character, so the Japanese expression seems appropriate.

    The Japanese Wikipedia article on kanjis doesn’t use the phrase 和字, but it does refer to 和製漢字, literally “Japanese-made Chinese characters”. There is a Japanese Wikipedia article on that topic, and also a separate Japanese Wikipedia article about “national characters”. The latter has the following section headers:

    • 日本の国字, national 国 characters 字 of の Japan (literally, the sun’s 日 origin 本)
    • 朝鮮半島の国字, national characters of the Korean 朝鮮 peninsula 半島 (literally, half 半 island 島). “Korean peninsula” sounds like a standard expression.
    • ベトナムの国字, national characters of Vietnam. Note that “Vietnam” is written in katakana letters rather than kanjis.

    The equivalent Korean expression is 國字, 국자, gukja. 國 and 囯 are alternate forms (the former is “traditional” and the latter is “simplified”, I think), and kokuji and gukja look like pretty similar pronunciations. Korean Wikipedia also uses the phrase 朝鮮漢字, “Korean Chinese characters”.

  13. Are the leaf nodes in the proposed decomposition of 燵 primitive, or can they be decomposed further?

    For one thing, 達 has a rare variant 逹, so the 羊 element isn’t carved in stone. For another, it isn’t clear to me (I’m in a hospital room and don’t have most of my reference material to hand) that the 土 element is historical, although the 羊 element may be. I believe the element above 羊 is historically not 土 but 大, as in 羍.