Конференция завершена. Ждем вас на Highload++ в следующий раз!
Москва, СКОЛКОВО
8 и 9 ноября 2018

Building Resilient Data Pipelines in GoБазы данных и системы хранения

Доклад отклонён
Grant Griffiths
GE Digital

I’m a Gopher of 3 years working as a Senior Software Engineer at GE Digital. I’m very passionate about Go, running the Go User Group at GE. We have bi-weekly talks with anywhere from 50-200 Gophers depends on the topic, with internal and external speakers such as Daniel Whitenack, Wally Quevedo, Joe Beda, and soon Francesc Campoy. I’ve spoken at Gophercon UK and also locally in at the GoSF meetup in San Francisco.

At GE Digital, I work in the Predix Cloud Engineering org, where I build Data Services in Go that store and process Industrial IoT data with Kafka, Cassandra, EMR/Spark, and much more. In addition to writing backend services, I also work on Site Reliability Engineering for my team, improving the monitoring/alerting, reliability, and performance for all data services. Specifically, I’ve built tooling that can simulate end users for black box testing.

I studied Computer Science and Mathematics at Syracuse University and like climbing rock, ice, and snow in my free time.

Тезисы

The modern world runs on Data. In this talk we will cover how Gophers of any level can easily build Data Pipelines in Go with Kafka and Cassandra. At the end, we will look at how GE has written a Data Pipeline in Go that can handle over 800,000 writes per second of industrial time series data.

Introduction to Data Pipelines:
- What are data pipelines
- Why Go is a good language for them

Package structure:
- How to lay out our data pipeline’s packages
- How data will flow throughout the application
- Example code

Writing integration tests:
- Using docker to write integration tests
- Simulating downtime using docker pause on Kafka or Cassandra
- Example code

Data source: Kafka:
- What is Kafka and how we can use it with Go
- How to ensure no data loss is possible with good offset management
- Reading from multiple Kafka partitions with a high level consumer
- Example code

Performing ETL on the data and business logic:
- Parsing data
- Data ETL
- Performing intermediary business logic
- Example code

Persistent Data Storage: Cassandra:
- Setting up gocql to write to Cassandra
- Best practices for writing to Cassandra
- Example code

Demo: Processing hundreds of thousands of message:
- Finished version of our demo application
- Running in a Kubernetes cluster at scale
- Killing components, seeing how it recovers
- Finished example code

Example use case: Go Data Pipeline at GE Digital:
- Example data pipeline that’s running in production at GE Digital
- Production results of a similar data pipeline with over 800,000 writes per second

Базы данных / другое
,
Масштабирование с нуля
,
GO

Другие доклады секции Базы данных и системы хранения

Rambler's Top100