Working theory: High-energy bibliometrics

In this longer post, we will document our current theory of high-energy bibliometrics, which we are using to guide our experiments. Part 1, below, considers the pre-quantum high-energy bibliometric regime, which is achieved through macroscopic manipulation of research papers with little concern for the microstructure level. In the forthcoming Part 2, we will consider quantum bibliodynamics.

Bibliokinetics intrinsics

A research paper, hereinafter denoted as $p$, is a complex system with a large number of particles, therefore a large degree of freedom. We consider its macroscopic state, described using the following parameters. Note that the model applies to all types of research papers, including manuscripts, preprints, conference proceedings, and journal articles.

$m_p$ intrinsic semantic mass: The amount of theoretical content contained in the paper. It can be wildly different from the paper’s physical mass, and is a property mainly of its theory, derivations, data, plots, and citations. For a paper occupying in the semantic space $\mathcal{S}_p$ with a local density $\rho_p(\cdot)$ and its citation interlink $c_p(\cdot)$, its intrinsic semantic mass is defined as $$m_p=\int_{\mathcal{S}_p} \rho_p(\vec{s}) d\vec{s} + \int_{\mathcal{N}(p)} c_p(p-q) dq$$ where $\mathcal{N}(p)$ is the neighborhood of $p$ in the citation network. The infinitesimal semantic, $d\vec{s}$, is taken to be the unit of semantic content, and has the dimension of entropy. The $\rho(\cdot)$ function has therefore the dimension of mass per unit entropy, commonly measured in kilograms per bit ($\mathrm{kg/b}$). This is a rather large unit, and we often use $\mathrm{mg/MiB}$ for printed media or $\mathrm{kg/GiB}$ for digital media. (Note the annoying difference between the binary and decimal prefixes.)
Semantic drift velocity $\vec{v}_p$: The rate of change of the paper’s semantic state over time. As the paper is read, cited, and discussed, its semantic state evolves, for the meaning is only realized through readers and not fixed at the intent of the author(s). The semantic drift velocity is therefore defined as $$\vec{v}_p = \frac{d\vec{s}_p}{dt}$$ where $\vec{s}_p$ is the semantic state vector of the paper in the semantic space $\mathcal{S}_p$, and $t$ is time. The nominal unit is bits per second, but we often use $\mathrm{MiB/min}$ for practical purposes. This is a simplification of the paper’s semantic drift, akin to the center of mass velocity in classical mechanics. In reality, the paper’s semantic state may have internal degrees of freedom.
Citation field $\mathcal{C}_p(\cdot)$: The global citation field is the vector field generated by all papers in the citation graph. Each paper’s $\mathcal{C}_p(\cdot)$ is a vector field in the semantic space that represents the spread of its influence through semantic diffusion. The global citation field is thus $$\mathcal{C}(\vec{t}) = \int_{\mathcal{P}} \mathcal{C}_p(\vec{t}-\vec{s}_p) dp.$$ This differs from the citation interlink $c_p(\cdot)$, which is a scalar field defined as the degree at which the paper responds to incoming citations.
Total semantic energy $E_p$: The total semantic energy of the paper, defined as $$E_p = \frac{1}{2} m_p |\vec{v}_p|^2 + \int_{\mathcal{S}_p} \|{\nabla \mathcal{C}_p(\vec{s})}\|^2 d\vec{s},$$ similar to the sum of kinetic and potential energy in classical mechanics.

Semantic interaction model

Publications that drift in semantic space carry more influence. The mirror of classical mechanics in bibliokinetics fails to capture the change in physics as the paper’s semantic drift velocity approaches the speed of light in the semantic space, $c$, which is the maximum possible speed of semantic twist. In spirit of the Lorentz transformation, we define the Academic Lorentz factor as

$$\gamma=\frac{1}{\sqrt{1-\frac{|\vec{v}_p|^2}{c^2}}}.$$

As a manuscript accelerates, its effective semantic mass increases by a factor of $\gamma$:

$$\hat{m}_p = \gamma m_p.$$

This can be captured as the uniform semantic energy, by assuming semantic mass is equivalent to energy through the relation $E=mc^2$. The equivalent semantic energy of a paper is then

$$(\hat{E}_p)^2 = \left(m_p c^2\right)^2 + \left(\int_{\mathcal{S}_p} \left\|{\frac{d}{dt}\nabla \mathcal{C}_p(\vec{s})}\right\|^2 d\vec{s}\right)^2c^2.$$

This behavior of semantic “heavyening” can be validated through the observation of the paper’s influence, and thus is experimentally testable. However, tracing a moving semantic state and measuring its influence when it passes a measurement point at near-light speed is highly complicated, and thus we will use a different approach to validate the theory — via collisions.

Collisions under acceleration

The core methodology of 3t.al. Labs involves the physical collision of two papers, frameworked as effectively a mutual, synchronous, objective co-peer-review. In the pre-quantum regime, the collision results in the emission of a discrete set of semantic particles, each with a continuous state (“distinuous emission”). The distribution of the emitted particles follows a power law:

$$\operatorname{Pr}(E_j \geq x) \propto x^{-\alpha},$$

where $E_j$ is the semantic energy of the particle $j$, and $\alpha$ the exponent is equal to the ratio of the papers’ effective semantic masses:

$$\alpha = \frac{\hat{m}_1^2+\hat{m}_2^2}{(\hat{m}_1+\hat{m}_2)^2}.$$

The emission of semantic particles is a stochastic process, governed by the following equation:

$$\sum_{j=1}^N E_j = \hat{E}_1 + \hat{E}_2.$$

The distribution of the emitted particles can be measured by sampling the semantic space around the collision point, and tracing the trajectories of the particles using standard bibliometric tools such as citation analysis. This is, empirically, much more straightforward to validate in an experimental setting.