
6. tidyped Class Structure and Extension Notes
Source:vignettes/tidyped-structure.Rmd
tidyped-structure.RmdThis document describes the structural contract of the
tidyped class in visPedigree 1.8.0. It is intended for
maintenance and extension work.
1. Class identity
tidyped is an S3 class layered on top of
data.table.
Expected class vector:
c("tidyped", "data.table", "data.frame")The class is created through new_tidyped() (internal
constructor) and checked with is_tidyped().
2. Core design goals
tidyped is designed to be:
-
safe for C++: integer pedigree indices
(
IndNum,SireNum,DamNum) are always aligned with row order, so C++ routines can index directly without translation; -
fast for large pedigrees: the fast path skips
redundant validation when the input is already a
tidyped; -
compatible with
data.table: in-place modification via:=andset()preserves class and metadata without copying; -
explicit about structural degradation: row subsets
that break pedigree completeness are downgraded to plain
data.tablewith a warning.
3. The head invariant: IndNum == row index
The single most important structural rule in visPedigree:
IndNum[i]must equalifor every row.
This means SireNum and DamNum are direct
row pointers: the sire of individual i lives at row
SireNum[i], and 0L encodes a missing
parent.
Every C++ function in visPedigree — inbreeding coefficients, relationship matrices, BFS tracing, topological sorting — relies on this invariant. If it breaks, C++ will read wrong parents.
This invariant is enforced at three levels:
-
tidyped(): builds indices from scratch during construction. -
[.tidyped: rebuilds indices in-place after valid row subsets. -
ensure_tidyped()/ensure_complete_tidyped(): detect and repair stale indices when class was accidentally dropped.
4. Column contract
4.1 Minimal structural columns
These four columns define a valid tidyped:
| Column | Type | Description |
|---|---|---|
Ind |
character | Unique individual ID |
Sire |
character | Sire ID, NA for unknown |
Dam |
character | Dam ID, NA for unknown |
Sex |
character |
"male", "female", or
"unknown"
|
Checked by validate_tidyped().
4.2 Integer pedigree columns
| Column | Type | Description |
|---|---|---|
IndNum |
integer | Row index (== row number, see §3) |
SireNum |
integer | Row index of sire, 0L for missing |
DamNum |
integer | Row index of dam, 0L for missing |
These exist whenever tidyped() is called with
addnum = TRUE (default). They are the interface between R
and C++.
4.3 Other common columns
| Column | Description |
|---|---|
Gen |
Generation number |
Family |
Family group code |
FamilySize |
Number of offspring in the family |
Cand |
TRUE for candidate individuals |
f |
Inbreeding coefficient (added by inbreed()) |
5. Metadata layer
Pedigree-level metadata is stored in a single attribute:
attr(x, "ped_meta")Built by build_ped_meta(), accessed by
pedmeta().
| Field | Type | Description |
|---|---|---|
selfing |
logical | Whether self-fertilization mode was used |
bisexual_parents |
character | IDs appearing as both sire and dam |
genmethod |
character |
"top" or "bottom" generation
numbering |
No other pedigree-level attributes should be added outside
ped_meta.
6. Structural invariants
The following invariants must hold for a valid
tidyped:
- IndNum == row index (see §3).
- Ind is unique — no duplicate individual IDs.
-
Completeness — every non-
NASire and Dam appears inInd. - Acyclicity — no individual is its own ancestor.
-
SireNum / DamNum consistency —
0Lfor missing parents, valid row indices otherwise. - ped_meta is the sole metadata container — no scattered attributes.
Invariants 1–5 are established by tidyped() and guarded
by [.tidyped. Invariant 6 is a development convention.
7. Constructor pipeline
tidyped() currently has two distinct tracing paths:
-
Raw-input path (
data.frame/data.table) — uses igraph for loop detection, candidate tracing, and topological sorting before integer indices are finalized. -
Fast path (
tidyped+cand) — skips graph rebuilding and uses C++ for candidate tracing and topological sorting on existing integer pedigree indices.
7.1 Full path: tidyped(raw_input)
When the input is a raw data.frame or
data.table:
-
validate_and_prepare_ped()— normalize IDs, detect duplicates and bisexual parents, inject missing founders. - Loop detection — igraph builds a directed graph and checks
is_dag();which_loop()andshortest_paths()are used only on the error path to report informative loop diagnostics. - Candidate tracing — if
candis supplied, igraph neighborhood search is used on the raw-input path. - Topological sort — igraph
topo_sort()on the raw-input path. - Generation assignment — C++ (
cpp_assign_generations_top/cpp_assign_generations_bottom) using the pedigree implied by the sorted rows. - Sex inference — resolve unknowns from parental roles.
- Build integer indices —
IndNum,SireNum,DamNum. -
new_tidyped()+ attachped_meta.
7.2 Fast path: tidyped(tp, cand = ids)
When the input is already a tidyped and
cand is supplied:
- Skipped: ID validation, loop detection, sex inference, founder injection.
-
Executed: C++ BFS tracing → C++ topo sort → C++
generation assignment → rebuild indices →
new_tidyped()+ped_meta.
The fast path is the preferred workflow for repeated local tracing from a previously validated master pedigree:
7.3 new_tidyped() — internal constructor
new_tidyped() attaches the "tidyped" class
via setattr() (no copy) and clears data.table’s invisible
flag via x[]. It does not attach
ped_meta — that is the caller’s responsibility. It should
only be called when the caller has already ensured structural
validity.
8. Three-tier guard system
Analysis functions must guard their inputs. visPedigree provides three guard levels, chosen based on what each function needs.
8.1 validate_tidyped() — visualization guard
- Attempts silent class recovery via
ensure_tidyped(). - Checks only that
Ind,Sire,Dam,Sexexist. - Does not require pedigree completeness.
- Used by:
visped(),plot.tidyped(),summary.tidyped().
8.2 ensure_tidyped() — structure-light guard
- If already
tidyped: returns as-is. - If class was dropped but 8 core columns (
Ind,Sire,Dam,Sex,Gen,IndNum,SireNum,DamNum) are present: rebuildsIndNumif stale, restores class, emits a message. - Does not check pedigree completeness.
- Used by:
pedsubpop(),splitped(),pedne(method = "demographic"),pedstats(ecg = FALSE, genint = FALSE),pedfclass()(whenfcolumn already exists).
8.3 ensure_complete_tidyped() — complete-pedigree
guard
- Everything
ensure_tidyped()does, plus: - Calls
require_complete_pedigree()— verifies that every non-NASire/Dam is present inInd. Stops with an error if not. - Required by any function that recurses through pedigree structure in C++.
- Used by:
inbreed(),pedecg(),pedgenint(),pedrel(),pedne(method = "inbreeding" | "coancestry"),pedcontrib(),pedancestry(),pedfclass()(whenfmust be computed),pedpartial(),pediv(),pedmat(),pedhalflife().
8.4 Choosing the right guard
| Guard | Recovers class? | Requires completeness? | When to use |
|---|---|---|---|
validate_tidyped() |
yes | no | Visualization |
ensure_tidyped() |
yes | no | Summaries on existing columns |
ensure_complete_tidyped() |
yes | yes | Pedigree recursion in C++ |
Some functions are conditionally guarded: they use
ensure_tidyped() by default but escalate to
ensure_complete_tidyped() when a parameter triggers
pedigree recursion (for example pedstats(ecg = TRUE),
pedne(method = "coancestry")).
9. Safe subsetting contract
[.tidyped is the key protection layer.
9.1 := operations
Modify-by-reference is passed through safely. Class and metadata are
preserved via setattr(). No copy occurs.
9.2 Column-only selections
If the selection removes core pedigree columns, the result is
returned as a plain data.table without warning.
9.3 Row subsets
After row subsetting, [.tidyped checks pedigree
completeness:
-
Complete subset (all referenced parents still
present):
IndNum,SireNum,DamNumare rebuilt in-place, class andped_metaare preserved. -
Incomplete subset (parent records missing): result
is downgraded to plain
data.tablewith a warning guiding the user totidyped(tp, cand = ids, trace = "up").
This downgrade is deliberate. It prevents stale integer indices from reaching C++ routines.
10. Computational boundaries: C++ vs igraph
visPedigree delegates heavy pedigree recursion to C++ and uses igraph where a graph object is still the simplest representation.
10.1 C++ — core computation path
| Task | C++ function |
|---|---|
| Ancestry / descendant tracing |
cpp_trace_ancestors,
cpp_trace_descendants
|
| Topological sorting | cpp_topo_order |
| Generation assignment |
cpp_assign_generations_top,
cpp_assign_generations_bottom
|
| Inbreeding coefficients |
cpp_calculate_inbreeding (Meuwissen & Luo) |
| Relationship matrices |
cpp_addmat, cpp_dommat,
cpp_aamat, cpp_ainv
|
All C++ functions consume SireNum / DamNum
integer vectors and assume the head invariant (§3).
10.2 igraph — graph-specific tasks
| Task | Where | igraph functions |
|---|---|---|
| Pedigree visualization |
visped() pipeline |
graph_from_data_frame,
layout_with_sugiyama, plot.igraph
|
| Connected components | splitped() |
graph_from_edgelist, components
|
| Loop detection |
tidyped() raw-input path |
graph_from_edgelist, is_dag
|
| Loop diagnosis |
tidyped() error path |
which_loop, shortest_paths,
neighbors, components
|
| Candidate tracing |
tidyped() raw-input path |
neighborhood |
| Topological sorting |
tidyped() raw-input path |
topo_sort |
igraph is not used in the core numerical pedigree analysis routines
such as inbreed(), pedmat(),
pedecg(), or pedrel(), but it is still part of
the preprocessing and visualization stack.
11. Extension rules
When extending the class, follow these rules.
11.1 Do not add new pedigree-level attributes
Prefer adding fields to ped_meta instead of scattering
new standalone attributes.
11.2 Keep computed state derivable
If a column can be rebuilt from pedigree structure, prefer derivation over storing opaque cached state.
11.3 Preserve data.table semantics
Use :=, set(), and setattr()
carefully. Avoid patterns that trigger full copies unless
unavoidable.
12. User-facing inspection helpers
| Function | Returns |
|---|---|
is_tidyped(x) |
TRUE if class is present |
is_complete_pedigree(x) |
TRUE if all Sire/Dam are in Ind |
pedmeta(x) |
The ped_meta named list |
has_inbreeding(x) |
TRUE if f column exists |
has_candidates(x) |
TRUE if Cand column exists |
Future extensions should prefer helper functions over direct attribute access.
13. Maintenance checklist
Before merging a structural change to tidyped,
check:
- Does class identity remain
c("tidyped", "data.table", "data.frame")? - Is the head invariant
IndNum == row indexpreserved after every code path? - Are
ped_metafields preserved correctly? - Does
[.tidypedstill handle:=without copy issues? - Do incomplete row subsets still downgrade with warning?
- Are integer pedigree columns rebuilt whenever a subset remains valid?
- Does
tidyped(tp_master, cand = ...)match the full path result? - After
setorder()ormerge(), are indices rebuilt before reaching C++? - Do package tests and vignettes build cleanly?
14. Recommended workflow
For large pedigrees, the intended usage pattern is:
# build one validated master pedigree
tp_master <- tidyped(raw_ped)
# reuse it for repeated local tracing (fast path)
tp_local <- tidyped(tp_master, cand = ids, trace = "up", tracegen = 3)
# modify analysis columns in place
tp_master[, phenotype := pheno]
# split only when disconnected components matter
parts <- splitped(tp_master)This keeps workflows explicit, fast, and safe.