Patrick Harrington, Ph.D., Co-Founder & Chief Data Scientist, Paysa
A few weeks ago Lydia Dishman wrote a fantastic piece characterizing the top tech companies as defined by the “quality” of their talent using thePaysa CompanyRank algorithm and how these companies change in rank over time. CompanyRank is an Expectation-Maximization algorithm applied over space and time to quantifying the network-wide flux of tech workers to/fro/retained at each technical company at each month in time…a distant cousin to Google’s PageRank algorithm. Another example of emergent structure learned from perceived independent chaos of a large, sparsely-connected complex system.
Figure 1. Paysa CompanyRank Illustrating Rise and Fall of Various Tech Companies
Figure 1 depicts the Paysa CompanyRank time-series of Uber, Facebook, Google and Zynga over time. The top slot, e.g., most talent dense company gets rank of #1 and decays down to an arbitrary low rank (high number)…think 1 is better than 2 is better than 3 and so forth. Note the rise and rapid collapse of Zynga, post IPO (3rd best talent dense company to current 590th).
Before getting into the CompanyRank algorithm, Google PageRank is motivated by the following thought experiment: “If I were on a website and randomly selected one of the links to another website, and went there, and so forth, what is the probability distribution of visited publishers as time goes to infinity”. Mathematically, this amounts to a Markov Chain update into itself (the stationary point of the system — think of the event horizon)
Figure 2. Markov Chain update of state vector p and transition matrix H. k here is discrete time.
Now the stationary point at infinity basically means any “transition” leads to itself (time index dropped as at infinity it’s meaningless).
Now, I am not going to get into the nuances of the Perron-Frobenius theorem on eigenvalue bounds, here (pretty slick if you’re interested in reading more). The solution to the equation in Figure 3 is whats called a generalized eigenvalue problem. The solution to this equation gives you vector p which is a probability distribution over different websites and the higher the value of that probability the more “important” that particular website is.
Now, for CompanyRank — the “links” are the talent moving into and out of companies from other companies at different points in time. The philosophical anchor of CompanyRank is “quality” — which is tricky to define and quantify (Zen and the Art of Motorcycle Maintenance spends a few chapters on the topic of quality).
CompanyRank wants to 1. estimate a vector p for each month of each year and 2. estimate a flux matrix H that captures the dynamics of this company to company network flow process. This matrix H is estimated over 5.75M engineers (and related professionals, e.g., data scientists or product managers) in the tech industry since the year 2000 forward. It captures the monthly + long term battle of talent between company i and company j(Hij ith/jth element of the matrix H).
Figure 4. Psuedo Code of CompanyRank Algorithm
Algorithm 1 shown in Figure 4 presents the pseudo-code of CompanyRank. The maximum likelihood estimation of the matrix H is not a simple transition matrix but what we call the Paysa Flux Matrix. In a nut shell its the net movement between two companies normalized by their collective size, e.g., number of employees.
This algorithm does what cannot be done when looking at an employee in isolation…quantifying quality and the movement of employees who were previously at high quality companies and have since moved to join new ones.If a company continues to hire from high quality companies their score will increase. If a company loses those from top companies or begins hiring from less quality companies, their score (and relative ranking) will decrease.
The CompanyRank is simply sorting the values of p from highest to lowest each month. So net net, the CompanyRank algorithm is doing what PageRank does well: captures quality of the underlying and how it changes with time. The analog of top publishers linking to other top publishers holds with talent moving from one top company to another.
In Silicon Valley there has been a suspicion that a group of engineers have made their way from Google to Facebook to Twitter and now to Uber, Pinterest, etc and this crew has largely been responsible for bringing companies to hyper growth phase. Perhaps the Paysa CompanyRank statistic is beginning to shed light on this phenomenon and other growth related (or death spiral related) behaviors…forecasting this behavior is next.