応募トーク

これは応募されたトークです。聞きたいと思うトークをSNSで拡散しましょう。選考時に参考にさせていただきます。

talk

Improving PySpark Performance - Leveraging DataFrames & other techniques(en)

スピーカー

Holden Karau

対象レベル：

中級

カテゴリ：

Big Data

説明

This talk covers a number of important topics for making scalable Apache Spark programs in Python.

目的

Understand how to effectively use Spark in Python.

概要

This talk covers a number of important topics for making scalable Apache Spark programs - from RDD re-use to considerations for working with Key/Value data, why avoiding groupByKey is important and more. We also include Python specific considerations, like the difference between DataFrames/Datasets and traditional RDDs with Python. We also explore some tricks to intermix Python and JVM code for cases where the performance overhead is too high.

Tweet

SPONSORS

Diamond

Platinum

Gold

Silver

Patron

Media

Tutorial