Commit Graph

3 Commits

Author SHA1 Message Date
Lalo
8e0d2bece2
feat: Add Spanish hyphenation support (#558)
## Summary

* **What is the goal of this PR?** Add Spanish language hyphenation
support to improve text rendering for Spanish books.
* **What changes are included?**
- Added Spanish hyphenation trie (`hyph-es.trie.h`) generated from
Typst's hypher patterns
- Registered `spanishHyphenator` in `LanguageRegistry.cpp` for language
tag `es`
  - Added Spanish to the hyphenation evaluation test suite
  - Added Spanish test data file with 5000 test cases

## Additional Context

* **Test Results:** Spanish hyphenation achieves 99.02% F1 Score (97.72%
perfect matches out of 5000 test cases)
* **Compatibility:** Works automatically for EPUBs with
`<dc:language>es</dc:language>` (or es-ES, es-MX, etc.)
<img width="115" height="189" alt="imagen"
src="https://github.com/user-attachments/assets/9b92e7fc-b98d-48af-8d53-dfdc2e68abee"
/>


| Metric | Value |
|--------|-------|
| Perfect matches | 97.72% |
| Overall Precision | 99.33% |
| Overall Recall | 99.42% |
| Overall F1 Score | 99.38% |

---

### AI Usage

Did you use AI tools to help write this code? _**PARTIALLY**_

AI assisted with:
- Guiding and compile
- Preparing the PR description
2026-01-28 01:17:48 +11:00
Luke Stein
7a53342f9d
fix: Allow line break after ellipsis and underscore (#425)
## Summary

* Add additional punctuation marks to the list of characters that can be
immediately followed by a line break even where there is no explicit
space

## Additional Context

* Huge appreciation to @osteotek for his amazing work on hyphenation.
Reading on the device is so much better now.
* I am getting bad line breaks when ellipses (…) are between words and
book file does not explicitly include some kind of breaking space.
* Per
[discussion](https://github.com/crosspoint-reader/crosspoint-reader/pull/305#issuecomment-3765411406),
several new characters are added in this PR to the `isExplicitHyphen`
list to allow line breaks immediately after them:

Character | Unicode | Usage | Why include it?
-- | -- | -- | --
Solidus (Slash) | U+002F | / | Essential for breaking URLs and "and/or"
constructs.
Backslash | U+005C | \ | Critical for technical text, file paths, and
coding documentation.
Underscore | U+005F | _ | Prevents "runaway" line lengths in usernames
or code snippets.
Middle Dot | U+00B7 | · | Acts as a semantic separator in dictionaries
or stylistic lists.
Ellipsis | U+2026 | … | Prevents justification failure when dialogue
lacks following spaces.
Midline Horizontal Ellipsis | U+22EF | ⋯ | Useful for mathematical
sequences and technical notation.


### Example:

This shows an example of what line breaking looks like *with* this PR.
Note the line break after "matter…" (which would not previously have
been allowed). It's particularly important here because the book
includes non-breaking spaces in "Mr. Aldrich" and "Mr. Rockefeller."


![IMG_2917](https://github.com/user-attachments/assets/8fa610a9-91dd-407f-8526-0019a8a7195f)

---

### AI Usage

While CrossPoint doesn't have restrictions on AI tools in contributing,
please be transparent about their usage as it
helps set the right context for reviewers.

Did you use AI tools to help write this code? **PARTIALLY**
2026-01-27 20:18:09 +11:00
Arthur Tazhitdinov
8824c87490
feat: dict based Hyphenation (#305)
## Summary

* Adds (optional) Hyphenation for English, French, German, Russian
languages

## Additional Context

* Included hyphenation dictionaries add approximately 280kb to the flash
usage (German alone takes 200kb)
* Trie encoded dictionaries are adopted from hypher project
(https://github.com/typst/hypher)
* Soft hyphens (and other explicit hyphens) take precedence over
dict-based hyphenation. Overall, the hyphenation rules are quite
aggressive, as I believe it makes more sense on our smaller screen.

---------

Co-authored-by: Dave Allie <dave@daveallie.com>
2026-01-19 12:56:26 +00:00