Use Hyphenopoly to insert better hyphenations in your Markdown content
Automagically insert soft hyphens into your Markdown content using Frank Liang’s LaTeX algorithm.
·622 viewsUpdate 2024-01-04: The hyphenate-limit-chars
property has ~70% global support. The easiest solution would be to add the following to your CSS:
Adding on to my previous post, Remove runts from your Astro Markdown files with ~15 lines of code, I wanted to share another quick tip for improving the layout of your Markdown prose.
We’re going to write a custom Remark plugin to suggest line breaks in our Markdown content. Doing this allows us to use the hyphens: manual
CSS property while avoiding awkward breaks in places they shouldn’t be.
Wait, can’t CSS fix this?
CSS does have a property for breaking words and inserting hyphens — hyphens: auto
— but it’s not super granular about the words it breaks. For example, words with only 5 letters will get broken just as easily as words with 20 letters. Yuck!
There is another CSS property to control the length of words that get broken (hyphenate-limit-chars
), but it’s not yet widely supported.
This custom plugin I wrote sprinkles line break opportunities into your markup by inserting the ­
entity. The ­
entity is a soft hyphen, which means it’s a suggestion to the browser to insert hyphens there when the hyphens: manual
CSS property is used.
Hyphenopoly to the rescue
Under the hood, I’m using the package hyphenopoly
to insert these soft hyphens everywhere. According to the package’s documentation, hyphenopoly uses Franklin M. Liang’s hyphenation algorithm developed for TeX. If you’re curious how that algorithm works, I suggest you check out the project’s README.
Getting started
Note: If you need help setting up your Astro blog for Markdown (.md
or .mdx
), I recommend following Astro’s “Build a Blog” tutorial.
This time, we’ll need two npm packages:
Writing our plugin
Now, we’ll create the JS file that will actually do the transformation. I like to store my custom plugins in a folder called /remark-plugins
at the root of my project, but you’re welcome to put them wherever.
This Remark plugin has a bit more going on than my last one. That’s because hyphenopoly must have a pattern specified to apply the soft hyphens. The package ships with a ton of built-in language support (71 by my count!), so you have to tell it which .wasm
file to load up.
In my case, I’m using the en-us
pattern. You can find the full list of patterns here.
Warning: Boring code explanation time!
- The
loaderSync
function is a bit of a hack — it loads theen-us.wasm
file from thehyphenopoly
package directory. I’m not sure if there’s a better way to do this, but it works for now. hyphenator
configures hyphenopoly:- We pass in our custom loader with the
.wasm
pattern, and specify which language(s) we want to use. - You can specify exceptions to the hyphenation rules, which I’ve done for the word “Houston.”
- You can specify a minimum word length for hyphenation, but I’ve left that commented out for now. Six is the default, and that was good enough for me.
- I’ve also commented out the optional
dontHyphenateClass
setting, which allows you to specify a CSS class that will prevent hyphenation from happening.
- We pass in our custom loader with the
- We’re using
unist-util-visit
to traverse the Markdown AST and replace the text content of each (textual) DOM node (node.value
) with our hyphenated text.
Enabling the plugin in Astro
The last thing we have to do is tell Astro to use our new plugin. We can do this by adding a remarkPlugins
array to our astro.config.mjs
file:
Enable word breaks in CSS
To reap the full benefit of our custom plugin, we need to add at least one CSS property to our prose content:
I like the way justified text looks, but I’ve found it can look a bit wonky across browsers and screen sizes. Your mileage may vary.
Feel free to tinker with the hyphenopoly settings to your heart’s content. You can load up additional languages, or even create your own custom patterns.
Further improvements
I’m still not 100% happy with the way this plugin works. I’d like a better method to load in the .wasm
pattern file. My loader function seems rather brittle.
Out of the box, hyphenopoly provides orphan control (they actually mean “runts”), which I’d like to test. If it does as good a job as my Remark “deruntify” plugin, then that’s redundant code I can happily rip out.
Feel free to drop me a line or tweet at me. I am always open to improving this tutorial or my code examples.
Further reading
- Hyphenopoly’s automatic hyphenation algorithm from the Hyphenopoly README
- Word Hy-phen-a-tion by Com-put-er by Frank Liang