User Embedding Explorer
What am I looking at?

Each dot is one of ~2,100 estimated users who wrote 5 or more fiction-related conversations in the WildChat dataset. Users with similar fiction prompts appear close together.

Each user's position is the mean embedding of all their fiction prompts (all-MiniLM-L6-v2, 384 dims), reduced to 50 dims with PCA, then to 2D with UMAP (n_neighbors=30, min_dist=0.1, cosine metric).

Topic labels come from HDBSCAN clustering on the user embeddings, with each cluster labeled by GPT-4o. Coarse labels appear at moderate zoom; fine labels appear when zoomed in further. Enable Topics in the legend to color users by their topic cluster.

Dot size reflects conversation count. Colors show majority category (>50% of conversations). Click a category to toggle; double-click to isolate. Click a topic to isolate it.

Learn more →

Loading...
Click to pin & view details