Unicode (as UTF-8) is a very popular format for encoding filenames on disk, but there are some subtly incompatible variants around. In particular, different operating systems have different ideas about how accents should be handled.
Mac applies something called NFD to filenames before they are stored on disk (Normalization Form Canonical Decomposition). This means that a character like the “ū” in the word “Jingū” is stored as two Unicode characters – a plain old LATIN SMALL LETTER U
character, followed by a COMBINING MACRON
character:
ls | grep 2009_11_03 | od -c -tx1 0000000 2 0 0 9 _ 1 1 _ 0 3 M e i j i 32 30 30 39 5f 31 31 5f 30 33 20 4d 65 69 6a 69 0000020 J i n g u ̄ ** \n 20 4a 69 6e 67 75 cc 84 0a
Windows does the exact opposite (NFC), combining the “u” and the macron together to produce a single LATIN SMALL LETTER U WITH MACRON
character.
0000000 2 0 0 9 _ 1 1 _ 0 3 M e i j i 32 30 30 39 5f 31 31 5f 30 33 20 4d 65 69 6a 69 0000020 J i n g ū ** \n 20 4a 69 6e 67 c5 ab 0a
On Linux, I’m not sure if there is a standard, but it’s typical to find filenames encoded with NFC. If anything, the standard in Linux is that a filename is just a series of bytes and the OS shouldn’t try to mangle them by converting the characters with NFC or NFD normalization forms.
I recently set up a second computer as a FreeNAS 9.2.0 storage device, and I wanted to migrate the files from my MacBook to it. The most straightforward way to do this is to enable the Apple “AFP” network filesharing service on FreeNAS, then on the Mac, copy the files to that network share however you like. This automatically takes care of any character conversion for you.
However, I tried this and only achieved about 2MB/s transfer speeds. I would have died of old age before I would have been able to copy my terabytes of data to FreeNAS.
So, instead I used rsync to send my files directly to the ZFS storage on FreeNAS, bypassing any of its network file system protocols. This achieved a steady ~110MB/s, which is wonderful. The issue came when I went to read those same files over AFP: I could briefly see folders that contained accents in the finder, but they would disappear after several seconds, then reappear 10 seconds later, then disappear again!
The issue was that rsync, being Linux-oriented, preserves the filename encoding when sending files, so the filenames on FreeNAS ended up still in Mac’s NFD format (with accents encoded as separate characters). This is a problem because netatalk, the AFP server on FreeNAS, expects filenames on FreeNAS volumes to be encoded in the “vol charset” which is defined in its configuration file:
http://netatalk.sourceforge.net/3.0/htmldocs/configuration.html#charsets
FreeNAS doesn’t set this explicitly, so it defaults to “UTF8”. Although the netatalk manual doesn’t say it, “UTF8” actually implies “UTF8 in NFC form”, so netatalk will be unable to serve the NFD-encoded filenames that originated on my Mac correctly. What happens is that a MacOS X client lists a directory over AFP, so netatalk converts the filenames on disk (that it thinks are NFC) to NFD (a no-op, since we’ve actually put NFD filenames on there to start with). This allows the accented characters to show up properly in MacOS and the listing looks okay. But then MacOS’s finder asks for more information about a file specifically by name. Netatalk converts the NFD filename that MacOS provides to the vol charset (set to NFC form), and then tries to look it up on the filesystem. But the filename doesn’t exist on the disk in NFC form, so it can’t find the file. This causes the file to disappear again in MacOS’s Finder.
Here are three ways of solving the problem, in order from worst to best:
Option 1: Change vol charset to UTF8-MAC
Changing the vol charset to “UTF8-MAC” will fix the issue by letting netatalk know that the filenames on disk are in NFD form.
The “vol charset” setting is found in the afp.conf
file at /usr/local/etc/afp.conf
. But you shouldn’t edit this file, as it is automatically regenerated at various times by the script /usr/local/libexec/nas/generate_afpd_conf.py
. Instead, edit that script:
# Remount root as writable so we can edit the script: mount -uw / nano /usr/local/libexec/nas/generate_afpd_conf.py
Find this section:
cf_contents.append("\tmax connections = %s\n" % afp.afp_srv_connections_limit) cf_contents.append("\tmimic model = RackMac\n") cf_contents.append("\tvol dbnest = yes\n") cf_contents.append("\n")
Before the append("\n")
line, add:
cf_contents.append("\tvol charset = UTF8-MAC\n")
Save and exit, then:
# Remount root as readonly and commit our changes to disk (takes ages on a USB flash drive, so be patient) mount -ur /
Now on FreeNAS’s Services/Control Services page, turn AFP off and back on in order to regenerate its configuration file. You should see the new line added when you cat /usr/local/etc/afp.conf
. Your Mac-encoded filenames will now serve correctly through AFP!
Option 2: Change the encoding of the filenames on FreeNAS’ disk to NFC
Alternatively, you could leave the AFP configuration alone and change the filename encoding on disk instead. This will make the files available to Mac, but with the caveat that the filenames will no longer be the same as your source files due to the encoding difference, and rsync that runs directly against ZFS will no longer consider them to be the same files.
I did this by creating a new plugin jail, then adding my ZFS volume as additional storage to that jail. From that jail’s console button, I used “pkg install convmv” to install the convmv package, which can change filename encodings. I changed the encoding to NFC like so:
convmv -f utf-8 -t utf-8 --nfc -r --no-test /mnt/my-files
(You should run without –no-test first so convmv can tell you exactly what it plans to do, before you accidentally mangle your filenames more!)
Option 3: Go back in time and copy the filenames correctly in the first place
You can avoid this whole issue in the first place by having rsync convert the filenames to NFC as they are copied to ZFS:
rsync -a --iconv=utf-8-mac,utf-8 my-files/ root@freenas.local:/mnt/my-files/
In fact, if you fix the filenames using Option 2, then do all your future rsyncs with the character conversion specified here, everything will be hunky-dory!
Thank you so much for this explanation and the solutions!
I was finally able to clean my box from all the nice little irritations I created by playing around with different share options in combination with rsync …
Superb explanation, thanks.
Thanks! Giant help here. A tip for other visitors who arrive here:
If you are running rsync on your mac (e.g. to rsync your music dir to a backup volume) you need to update rsync (mavericks appears to ships with 2.6.9).
First install brew if you haven’t already (go to brew.sh)
Then, ‘brew tap homebrew/dupes’ and then ‘brew install homebrew/dupes/rsync’
After that, rsync –iconv=utf-8-mac,utf-8 works like a charm with Freenas.
Dear Nicholas,
Thank you for writing this excellent article and proposing good solutions. Even though I’m a developer, this one had me stumped for a while, trying to work out where the problem lay.
For anyone who has unfortunately used a mixture of rsync/afp/smb over time to transfer files back and forth (and thus has a mix of NFC/NFD formatted filenames, possibly doubly encoded): the python script below can help to identify the files/directories most likely to be affected (i.e. non-ascii characters). Replace ‘.’ with whichever directory you wish to scan recursively.
[Also, if you’re in that situation, you should hopefully be able to fix this using the convmv tool and the –double and -hfc option on your backup server]
import chardet
import os
for root, directories, filenames in os.walk(‘.’):
for n in filenames:
if chardet.detect(n)[‘encoding’] != “ascii”:
print ‘%s:\n%s => %s (%s)’ % (root, n, chardet.detect(n)[‘encoding’], chardet.detect(n)[‘confidence’])
Can you clarify if I can circumvent the whole problem by using rsync to sync between my MAC and into freeNAS using a CIFS share on freeNAS? If I understand this correctly, doing it that way will sync my files as they are and not do any conversion.
Yup, that avoids any issues. The network file systems (even AFP) take care of converting filenames for you, it’s only when you try to bypass CIFS or AFP that you have to take care of the conversion yourself.
Hello Nicholas!
Having a similar issue, you might be able to help…
Just created a new dataset and shared it through NFS. Successfully mounted it using the “nfc” option, and can create files and folder with special characters in their names without problems. But every time I click a folder with accents or special characters, the content blinks and I go back to the previous folder. It only works if I set “noac” in the mount options, to disable caching… Any clues why?
I’m using FreeNAS-11.0-U4 and OS X 10.2.6
Hm, that’s a tricky one, it sounds like it should be working already if the “nfc” mount option really does translate filenames in both directions. On the server, I would pipe ls through `od -c -tx1` to check how the filenames on disk are actually encoded. They should be in NFC form with that setup (so you’ll see accented characters in the output of `od` rather than an unaccented letter followed by a combining accent character), but if something is going wrong then they might be in NFD form.
Thanks for this excellent explanation! I had the opposite problem. I used FreeNAS with ZFS and changed the OS to OSX with OpenZFS. Then I couldnt access the files with special characters.
I used the convmv command with the –nfd and now everything is readable again! Great!