There were several great talks recently at the Austin Data Science Meetup. Ryan Orban from Galvanize talked about building a data science team. Several points resonated with my experience building a database development team.
Hire “T-shaped people”, not “Unicorns”
In technology hiring, a “unicorn” is someone that can do pretty much everything. They can create your application, do you data architecture, your data science, your operations, the whole nine yards. That might sound like the right type of person to hire, because they can handle whatever your needs are. It’s the ultimate in flexibility.
Reality is different though. “Unicorns” do exist, but they’re extremely rare. Rare enough that many people that think they are a unicorn actually aren’t. And those who actually are will completely control your hiring negotiation. Their abilities mean they will only work on the most interesting things and only on their terms.
Instead, focus on hiring “T-shaped” people: those with a modest amount of knowledge across a wide range of areas, and a great depth of expertise in one area (data science, software engineering, operations, etc). A team of “T”s means you have real experts in your necessary domain areas, and that those experts can easily communicate with with each other because of their shared knowledge.
Team structure is important
Ryan described 3 options for team structure:
- Centralized data teams mean all your data people sit together and are on a single team. This is where many companies start out. The problem with this approach is that it’s easy for your data team to turn into an “ivory tower” that is out of touch with the rest of the business.
- Embedded is what many companies move to when they become unhappy with a centralized team. Here, you spread your data people out across the land. Each one is on a separate product team, with little or no formal organization amongst your data experts. The structure is problematic because you lose economy of scale. Each expert is left to their own devices, and opportunities to develop cross-team/cross-product code or knowledge are lost.
- Ryan described a Hub and Spoke model as being ideal. Here, your experts spend most of their time embedded in teams (say, 4 days a week), but still spend a significant portion working in a centralized fashion. This is the best of both worlds, where you gain economy because of centralization, but everyone is still very responsive to the business needs.
Process is important
Even though data scientists are just as technical as software engineers, they operate in a very different fashion than a real-time application. The data analysis steps required to build and tune a model may take hours (or even days) to run, while processing very large quantities of data. Data scientists also like to work in very high level languages (notably Python and R).
Applications are typically developed in a very different environment. Response times are measured in fractions of a second, and often you are dealing with a lower level language.
That kind of environment kills the productivity of your data science team, because they are constantly waiting for feedback on how a model will perform in the real world.
The moral here is this: all your experts need to work together to be successful.
Consultants should be part of your growth process
This wasn’t in Ryan’s presentation, but I did ask him about it afterwards. If you are in the process of building a team of experts (really, in any field), you should consider leveraging consultants to help with that growth. This doesn’t simply mean using consultants to “bridge the gap” until you’ve hired some people. You should use consultants to help you build the team itself.
Consultants can help with job definition (so you don’t accidentally write a ridiculous job description that no one could actually fully meet), provide screening questions and participate in the interview process. This is especially important for the first person you hire in a specific discipline; an expert consultant can accurately gauge the experience level of someone in a field that you don’t know much about. Your head of application development should certainly interview the first data expert you hire, and they’ll be able to talk to them at a technical level, but they won’t have the level of experience necessary to really judge how knowledgeable they are. They also wont be able to discuss in detail an applicant’s preferences in the field. An expert at application development can discuss the pros and cons of Ruby versus Python, but not whether surrogate keys are a blessing or a curse. (Conversely, a data architect can speak at length about surrogate keys, but probably doesn’t have enough insight to intelligently discuss Ruby vs Python.)
Of course, you can also use consultants to jump start your development while you are hiring, but you need to be careful of a conflict of interest that may exist if you want them in your hiring process. If you want to use a consultant in both roles, be certain to discuss that with them up-front. You both need to be comfortable with their involvement in “hiring their replacement”. Many consultants will happily do this and do a good job at it, but some will resent it (especially if you brought a consultant in as “contract with potential to hire”!) In my experience, startup companies often don’t need full-time experts in all fields. It can make a lot of sense to contract out different areas of expertise until you need one or more full-time people in that area.