Collaboration Tools for Remote Data Science Teams

This is an extract from an article written by Pascal Bugnion, Data Engineer at Faculty.

Bryony sandy Channel Partnerships

By Bryony Sandy
3rd July 2020

Due to the recent COVID-19 outbreak, millions of employees are having to embrace remote working, including thousands of data scientists. As data science teams grow larger and more complex, facilitating remote working has become more complex too. Data science is now done by cross-functional teams that bring together subject-matter experts, modellers, data visualisation experts, machine learning engineers, product managers and designers.

Unsurprisingly, when these teams begin to shift toward remote working, effective collaboration often becomes a huge challenge especially as communication is harder when teams are not co-located. Read on to see that the right tooling is essential to helping data science teams perform to the best of their ability while working remotely.

Avoiding Frustrations

High-functioning data science teams strive for consistent tooling and infrastructure across the team. It is much easier to call and help a colleague if you both work with similar tools and to write reusable, shareable code if the environment in which that code runs in is more constrained.

There are many components that make up a data science infrastructure. Faculty have found the below processes to be effective:

  • Give everyone the same hardware.
  • Enforce common tooling. If everyone in the team uses Python, sharing code and models is greatly simplified. Similarly, if everyone uses the same text editor, the organisation can develop processes or even software (e.g. editor plugins) to facilitate collaboration.
  • Enforce common environment management. If everyone in the team uses the same environment management system (e.g. Conda environments or Docker containers), data scientists can collaborate both on the code and on the environment the code runs in.
  • Always store data online as it is then accessible by everyone, making it much easier to share code or to debug issues together.
  • Have an easy way to declare reproducible workflows that other team members can run. For instance, having a well-documented store of Docker containers that can run particular parts of the team’s data processing pipeline means that not everyone in the team needs to know every part of that pipeline at the same level of detail. In turn, this reduces the need for long phone calls explaining how to run the pipeline.
  • As much as possible, deploy models behind documented APIs. Having a clear interface allows more people to leverage the team’s work.

Avoid Isolation by Making Work Visible

Sharing a goal is one of the most motivating elements of teamwork. However, when working remotely, it is hard to know what other team members are working on and to generate a sense of shared purpose. This can lead to team members feeling isolated.

Ensuring the work is visible is the best way to reduce this feeling. By having a good source code manager like GitHub, GitLab or Bitbucket you can see the steady flow of PRs, comments, and reviews to give a sense of motion to the team. Continuous integration pipelines running from this source code manager also increases visibility.As well as exposing activity through a source code manager, organisations also make other sources of activity available.

Avoid silos by sharing best practices

When teams work remotely, there is less space for ad-hoc knowledge sharing. Hallway conversation that can spark new ideas is less likely to happen. Therefore, teams need to be much more deliberate about knowledge sharing.

Faculty have seen teams build knowledge repositories in Confluence or the open source knowledge repo. In Faculty Platform, a data science workbench, Faculty are working on building a way to easily share blueprints for common data science tasks. This allows data scientists to gradually build an organisation-specific knowledge centre for a single, consistent view of best practices.

Conclusion

It is hard to build cross-functional teams to deliver on the promise of machine learning as well as to recruit people and to get them to speak the same language. Trying to do this with a remote team is even harder.

Good tools will not guarantee success, but they will make your team-members more productive and foster a feeling of collaboration and shared goals.

Faculty Platform gives remote data science teams a shared infrastructure for collaborating on model development and deployment – and all the best practices from this post are built into it.

If you are interested in finding out more about the Faculty Platform, contact us today.

This is an extract from an article written by Pascal Bugnion, Data Engineer at Faculty. To see the full details click here.

Share this article:

Latest articles

Channel Tools partners with Quantum Dice to offer QRNGs for trusted cybersecurity

Channel Tools integrates Quantum Dice's cutting-edge Quantum Random Number Generator (QRNG)…

Read More

Transforming Business Operations with mpro5: A Comprehensive Guide

mpro5 offers a versatile platform comprising four key components: the mpro5 App, Web Portal,…

Read More

Channel Tools Announces Strategic Partnership with mpro5 to Enhance Operational Efficiency and Innovation

mpro5 allows customers to quantifiably improve operational efficiencies through cost savings,…

Read More

See all articles

Join our Partnership Program

There are many benefits partnering with us. Simply complete the form at the bottom of this page to discuss how the program can sky-rocket your visibility in any technology market place. We have huge experience in the creation of new channels – and we can create introductions, leads and business opportunities for your sales teams.

Join us

Want to know how a Channel Partnership could benefit your business?

Complete the form below and one of our team will call you back:

    I agree to the privacy statement and terms & conditions