Конференция завершена. Ждем вас на Saint HighLoad++ в следующий раз!

Профессиональная конференция разработчиков высоконагруженных систем

20 и 21 сентября 2021 Санкт-Петербург, DESIGN DISTRICT DAA in SPB

How to Stream 100B+ Events Daily with Spark Structured Streaming Архитектуры, масштабируемость

Доклад отклонён

Тезисы

Все презентации конференции

At AppsFlyer we ingest more than 100 billion events daily through our Kafka operation, which is then stored in a data warehouse hosted on Amazon S3. With AppsFlyer's increasing growth & scale, latency of data and resilience of the system began to pose complexities, and we found that we had to start rethinking our current approach and provide a better way to do our raw data processing.

This talk will focus on how we migrated the current raw data ingestion system to a solution based on Spark structured streaming.

I'll discuss what Spark structured streaming even is, some of the motivators for the migration, and why it was the right solution for us. You will get to see some of the challenges we faced during implementation, such as picking the correct data partitioning, how we ensured continued compliance with our GDPR solution, and the tooling we built to support the migration while providing our exactly-once solution.

The solution and the approaches that will be presented in this talk can be applied in your own data pipelines to create more resilient systems and correct data flows.

Ivan Kosianenko

AppsFlyer

Passionate software engineer and technical leader with vast experience in developing high load systems, Big Data infrastructure and building cloud services from garage phases through public launch.

Другие доклады секции Архитектуры, масштабируемость

Что такое хорошая интеграция

Максим Цепков

CUSTIS

Высоконагруженная Платформа 1С

Антон Дорошкевич

ИнфоСофт

Kafka. Как мы строили корпоративную шину данных, которая обрабатывает до 3 млн сообщений в секунду

Иван Гаас

Почтатех

Концентрируемся на бизнес-модели данных: от ETL к ELT

Иван Зерин

Scentbird

Ускорь это немедленно, или Легкая сеть тяжелого бэкенда

Илья Щербак

ВКонтакте, VK

Ubisoft в Google Cloud: автомасштабирование игрового кластера

Владислав Шпилевой

Ubisoft

Как снизить накладные расходы на добавление +1 микросервиса

Руслан Сафин

Byndyusoft

Service Mesh на стероидах. Часть 1: как построить управляемое взаимодействие между сотнями микросервисов

Алексей Ефимов

Netcracker

Транспорт будущего, или Как мы ускорили ВКонтакте в 1,5 раза

Александр Тоболь

ВКонтакте, VK

Построение масштабируемой и гибкой системы потоковой обработки данных: как мы дали возможность загрузить в 2ГИС товары и услуги для 4.5М компаний

Игорь Яцевич

2ГИС

Раздача контента с HDD: быстро, увлекательно и надежно

Кирилл Шваков

Kinescope

Облака для самых маленьких

Алексей Учакин

EdgeЦентр

Stateful Deployment Platform или как Uber управляет сотнями тысяч баз данных

Егор Гришечко

Uber