応募トーク

これは応募されたトークです。聞きたいと思うトークをSNSで拡散しましょう。選考時に参考にさせていただきます。

talk

Improving PySpark Performance - Leveraging DataFrames & other techniques(en)

スピーカー

Holden Karau

対象レベル:

中級

カテゴリ:

Big Data

説明

This talk covers a number of important topics for making scalable Apache Spark programs in Python.

目的

Understand how to effectively use Spark in Python.

概要

This talk covers a number of important topics for making scalable Apache Spark programs - from RDD re-use to considerations for working with Key/Value data, why avoiding groupByKey is important and more. We also include Python specific considerations, like the difference between DataFrames/Datasets and traditional RDDs with Python. We also explore some tricks to intermix Python and JVM code for cases where the performance overhead is too high.
  • このエントリーをはてなブックマークに追加