personalContext
itsnick.co / context
research · v.2026.04
kindresearch · methodology
scopeself-reflexive only
statelive · iterating
repoprivate
a research project / nick friesen / 2026

personalContext

A behavioral psychometrics pipeline. Years of cross-platform messages, an LLM-as-judge, and triangulation across many relationships — used to build a self-model that's harder to lie to than a survey.

4
data sources
10
dimensions
N
relationships
1
subject · self only

Surveys are self-report. People aren't.

Conventional psychometrics — Big Five, MBTI, every workplace assessment — asks you about you. "Are you organized?" The answer reflects how you want to be seen, what mood you're in, what you think the assessor will reward. The instrument and the subject are the same person, and the data inherits the bias.

The opposite of self-report is behavioral: what you actually did, written down, in real life, without an examiner watching. Most of us already produce that data continuously — in messages, emails, threads — for years on end. The hard part isn't collection. It's the rubric.

personalContext is the rubric and the pipeline that runs it.

Five stages.

01 / ingest
Cross-platform
Pull years of messages from iMessage, Slack, Gmail, Google Chat. Normalize to a common schema (sender, time, text, thread).
read-only · via MCP
02 / batch
Per-relationship
Group all messages by counterpart. A relationship batch is a multi-year document of how you actually behave with one specific person.
k batches · k = many
03 / score
LLM-as-judge
Run each batch through a fixed prompt that scores the subject on a 10-dimension psychometric framework. The judge is required to anchor each score to message-level evidence quotes.
fixed rubric · evidence-anchored
04 / triangulate
Cross-relational
Combine the per-relationship scores. A trait that appears across many counterparts is high-confidence signal; a trait that appears in one is a context-specific mask.
consensus & variance
05 / instrument
Reproduce it
Synthesize a yes / no self-report instrument that, if administered to the subject cold, should reproduce the same behavioral profile. Closes the loop and exposes any noise the analysis quietly carried.
forward validation

Ten dimensions, evidence-anchored.

Big Five is the field's strongest survey instrument. It captures a lot of variance but loses the texture that actually predicts how someone shows up — conflict style, authority response, temporal orientation. The framework here extends Big Five with five additional dimensions chosen because they are readable in messaging behavior, not just in self-report.

idx dimension what reads in messages
D-01 Openness to experience Topical breadth, curiosity markers, willingness to entertain unfamiliar framings.
D-02 Conscientiousness Follow-through on commitments, planning artifacts, response latency consistency.
D-03 Extraversion / energy Initiation rate, message volume per session, social-orientation language.
D-04 Agreeableness Conflict-softening, repair after disagreement, perspective-taking phrasing.
D-05 Emotional regulation Reactivity under stress, recovery time, vocabulary breadth for emotions.
D-06 Risk & novelty tolerance Comfort with ambiguous plans, willingness to commit before confirmation.
D-07 Authority response Behavior shift when speaking up vs. down a hierarchy; deference patterns.
D-08 Conflict style Avoid · accommodate · compromise · compete · collaborate; recovery rituals.
D-09 Temporal orientation Past · present · future weight in language; planning horizon; nostalgia tone.
D-10 Identity coherence Stability of stated values, beliefs, and self-descriptions across counterparts.

Cross-relational triangulation.

The methodological move that makes the whole thing useful. A single-relationship behavioral analysis is just one mask — you behave differently with your mother than with your business partner than with a romantic partner than with a friend you've known since you were eight. A reading from any one of those is a reading of how you show up in that context.

Run the same rubric across many counterparts and the structure emerges: traits that appear across many relationship types are higher-confidence reads of the underlying person. Traits that appear in only one are context-specific behaviors — masks, and themselves useful information about who that mask gets worn for.

single-channel · brittle

One relationship's scores look like a confident reading. They aren't — they're the subject's behavior in that specific dynamic.

D-010.62
D-020.81
D-040.74
D-080.35
illustrative shape · not real values

multi-channel · triangulated

Many relationships, one rubric. Consensus across batches is the underlying trait. Variance is the mask, and the masks themselves cluster meaningfully.

D-01 μ0.71±.05
D-02 μ0.78±.04
D-04 μ0.69±.07
D-08 μ0.46±.22
D-08 has high variance · likely mask, not trait

From profile to instrument.

The pipeline ends where conventional psychometrics begins: a self-report instrument. The behavioral profile is used as a target, and the system synthesizes yes / no items — concrete, scenario-anchored, intentionally hard to game — that should reproduce the same dimensional readings if the subject answers them honestly, cold.

This is forward-validation, and it's the most important diagnostic in the whole pipeline. If the synthesized instrument does not reproduce the behavioral profile, the analysis was carrying noise or bias the rubric didn't catch. If it does, you have a portable, transferable read of the same construct — and the underlying behavioral data backs it up with literal evidence quotes.

Self-reflexive only.

The same pipeline could be pointed outward — toward employees, partners, candidates — and it absolutely should not be. Behavioral psychometrics is asymmetric: the tool that lets you see yourself more honestly is the same tool that, used on someone else, would be surveillance.

One subject, one data owner, one purpose. personalContext is run on my own message history, by me, to model myself. It is not a product to be administered to other people. The rubric is publishable; the data and weights are not.

Open questions, prior art, what's next.

open questions

How many distinct relationship batches are needed before triangulation stabilizes? Does the LLM-as-judge drift with model versions? Are dimension definitions invariant across the lifespan, or do early-life batches need recalibration?

prior art

Big Five (Costa & McCrae) is the conceptual base. LIWC and other lexical-inventory approaches read messages but lose context. LLM-as-judge work in alignment evals is the closest methodological neighbor — this just turns it onto behavioral instead of factual material.

what's next

Tighter inter-rater agreement testing across model families. A formal split between the "trait" signal and the "mask" signal as separate outputs. Quietly running the synthesized instrument on the subject and comparing.

Methodology talk, welcome.

Happy to discuss the rubric, the pipeline, or the validation approach with researchers, psychometricians, or anyone working on behavioral-data analysis. The instrument and downstream profile content stay private.