-- ·|­û / µù¥U -- ¡@
¡@±b¸¹¡G
¡@±K½X¡G
¡@ | µù¥U | §Ñ°O±K½X
3/26 ·s®Ñ¨ì¡I 3/19 ·s®Ñ¨ì¡I 3/14 ·s®Ñ¨ì¡I 12/12 ·s®Ñ¨ì¡I
ÁʮѬyµ{¡EQ & A¡E¯¸°È¯d¨¥ª©¡E«ÈªA«H½c
¢x 3ds Max¢x Maya¢x Rhino¢x After Effects¢x SketchUp¢x ZBrush¢x Painter¢x Unity¢x
¢x PhotoShop¢x AutoCad¢x MasterCam¢x SolidWorks¢x Creo¢x UG¢x Revit¢x Nuke¢x
¢x C#¢x C¢x C++¢x Java¢x ¹CÀ¸µ{¦¡¢x Linux¢x ´O¤J¦¡¢x PLC¢x FPGA¢x Matlab¢x
¢x Àb«È¢x ¸ê®Æ®w¢x ·j¯Á¤ÞÀº¢x ¼v¹³³B²z¢x Fluent¢x VR+AR¢x ANSYS¢x ²`«×¾Ç²ß¢x
¢x ³æ´¹¤ù¢x AVR¢x OpenGL¢x Arduino¢x Raspberry Pi¢x ¹q¸ô³]­p¢x Cadence¢x Protel¢x
¢x Hadoop¢x Python¢x Stm32¢x Cortex¢x Labview¢x ¤â¾÷µ{¦¡¢x Android¢x iPhone¢x
¥i¬d®Ñ¦W,§@ªÌ,ISBN,3dwoo®Ñ¸¹
¸Ô²Ó®ÑÄy¤ÀÃþ

Spark¤j¼Æ¾Ú°Ó·~¹ê¾Ô¤T³¡¦±¡G¤º®Ö¸Ñ±K|°Ó·~®×¨Ò|©Ê¯à½ÕÀu

( ²Åé ¦r)
§@ªÌ¡G¤ý®aªL¡B¬q´¼µØ¡B®L¶§Ãþ§O¡G1. -> µ{¦¡³]­p -> Spark
ĶªÌ¡G
¥Xª©ªÀ¡G²MµØ¤j¾Ç¥Xª©ªÀSpark¤j¼Æ¾Ú°Ó·~¹ê¾Ô¤T³¡¦±¡G¤º®Ö¸Ñ±K|°Ó·~®×¨Ò|©Ê¯à½ÕÀu 3dWoo®Ñ¸¹¡G 49510
¸ß°Ý®ÑÄy½Ð»¡¥X¦¹®Ñ¸¹¡I

¡i¯Ê®Ñ¡j
NT°â»ù¡G 1495 ¤¸

¥Xª©¤é¡G2/1/2018
­¶¼Æ¡G1143
¥úºÐ¼Æ¡G0
¯¸ªø±ÀÂË¡G
¦L¨ê¡G¶Â¥Õ¦L¨ê»y¨t¡G ( ²Åé ª© )
¥[¤JÁʪ«¨® ¢x¥[¨ì§Úªº³Ì·R
(½Ð¥ýµn¤J·|­û)
ISBN¡G9787302489627
§@ªÌ§Ç¡@|¡@ĶªÌ§Ç¡@|¡@«e¨¥¡@|¡@¤º®e²¤¶¡@|¡@¥Ø¿ý¡@|¡@§Ç
(²Åé®Ñ¤W©Ò­z¤§¤U¸ü³sµ²¯Ó®É¶O¥\, ®¤¤£¾A¥Î¦b¥xÆW, ­YŪªÌ»Ý­n½Ð¦Û¦æ¹Á¸Õ, ®¤¤£«OÃÒ)
§@ªÌ§Ç¡G

ĶªÌ§Ç¡G

«e¨¥¡G

¡@¡@¤j¼Æ¾Ú¹³·í¦~ªº¥Ûªo¡B¤H¤u´¼¯à¡]Artificial Intelligence¡^¹³·í¦~ªº¹q¤O¤@¼Ë¡A¥¿¥H«e©Ò¥¼¦³ªº¼s«×©M²`«×¼vÅT©Ò¦³ªº¦æ·~¡A²{¦b¤Î¥¼¨Ó¤½¥qªº®Ö¤ß¾ÀÂS¬O¼Æ¾Ú¡A®Ö¤ßÄvª§¤O¨Ó¦Û°ò¤_¤j¼Æ¾Úªº¤H¤u´¼¯àªºÄvª§¡CSpark¬O·í¤µ¤j¼Æ¾Ú»â°ì³Ì¬¡ÅD¡B³Ì¼öªù¡B³Ì°ª®Äªº¤j¼Æ¾Ú³q¥Î­pºâ¥­»O¡A2009¦~½Ï¥Í¤_¬ü°ê¥[¦{¤j¾Ç§B§J§Q¤À®ÕAMP¹êÅç«Ç¡A2010¦~¥¿¦¡¶}·½¡A2013¦~¦¨¬°Apache°òª÷¶µ¥Ø¡A2014¦~¦¨¬°Apache°òª÷ªº³»¯Å¶µ¥Ø¡C°ò¤_RDD¡ASpark¦¨¥\ºc«Ø°_¤F¤@Åé¤Æ¡B¦h¤¸¤Æªº¤j¼Æ¾Ú³B²zÅé¨t¡C
¡@¡@¦b¥ô¦ó³W¼Òªº¼Æ¾Ú­pºâ¤¤¡ASpark¦b©Ê¯à©MÂX®i©Ê¤W³£§ó¨ãÀu¶Õ¡C
¡@¡@¡]1¡^Hadoop¤§¤÷Doug Cutting«ü¥X¡GUse of MapReduce engine for Big Data projects will decline, replaced by Apache Spark¡]¤j¼Æ¾Ú¶µ¥ØªºMapReduce¤ÞÀºªº¨Ï¥Î±N¤U­°¡A¥ÑApache Spark¨ú¥N¡C¡^
¡@¡@¡]2¡^Hadoop°Ó·~µo¦æª©¥»ªº¥«³õ»â¾ÉªÌCloudera¡BHortonWorks¡BMapR¯É¯ÉÂà§ëSpark¡A¦}§âSpark§@¬°¤j¼Æ¾Ú¸Ñ¨M¤è®×ªº­º¿ï©M®Ö¤ß­pºâ¤ÞÀº¡C
¡@¡@2014¦~ªºSort Benchmark´ú¸Õ¤¤¡ASpark¬í±þHadoop¡A¦b¨Ï¥Î¤Q¤À¤§¤@­pºâ¸ê·½ªº±¡ªp¤U¡A¬Û¦P¼Æ¾Úªº±Æ§Ç¤W¡ASpark¤ñMapReduce§Ö3­¿¡I¦b¨S¦³©x¤èPB±Æ§Ç¹ï¤ñªº±¡ªp¤U¡A­º¦¸±NSpark±À¨ì¤F1PB¼Æ¾Ú¡]¤Q¸U»õ±ø°O¿ý¡^ªº±Æ§Ç¡A¦b¨Ï¥Î190­Ó¸`ÂIªº±¡ªp¤U¡A¤u§@­t¸ü¦b4¤p®É¤º§¹¦¨¡A¦P¼Ë»·¶W¶®ªê¤§«e¨Ï¥Î3800»O¥D¾÷¯Ó®É16­Ó¤p®Éªº°O¿ý¡C
¡@¡@2015¦~6¤ë¡ASpark³Ì¤jªº¶°¸s¨Ó¦ÛÄË°T¡X¡X8000­Ó¸`ÂI¡A³æ­ÓJob³Ì¤j¤À§O¬Oªü¨½¤Ú¤Ú©MDatabricks¡X¡X1PB¡A¾_¾Ù¤H¤ß¡I¦P®É¡ASparkªºContributor¤ñ2014¦~º¦¤F3­¿¡A¹F¨ì730¤H¡FÁ`¥N½X¦æ¼Æ¤]¤ñ2014¦~º¦¤F2­¿¦h¡A¹F¨ì40¸U¦æ¡CIBM¤_2015¦~6¤ë©Ó¿Õ¤j¤O±À¶iApache Spark¶µ¥Ø¡A¦}ºÙ¸Ó¶µ¥Ø¬°¡G¥H¼Æ¾Ú¬°¥D¾Éªº¡A¥¼¨Ó¤Q¦~³Ì­«­nªº·sªº¶}·½¶µ¥Ø¡C³o¤@©Ó¿Õªº®Ö¤ß¬O±NSpark´O¤JIBM·~¤º»â¥ýªº¤ÀªR©M°Ó°È¥­»O¡A¦}±NSpark§@¬°¤@¶µªA°È¡A¦bIBMBluemix¥­»O¤W´£¨Ñµ¹«È¤á¡CIBMÁÙ±N§ë¤J¶W¹L3500¦W¬ã¨s©M¶}µo¤H­û¦b¥þ²y10§E­Ó¹êÅç«Ç¶}®i»PSpark¬ÛÃöªº¶µ¥Ø¡A¦}±N¬°Spark¶}·½¥ÍºA¨t²ÎµLÀv´£¨Ñ¬ð¯}©Êªº¾÷¾¹¾Ç²ß§Þ³N¡X¡XIBM SystemML¡C¦P®É¡AIBMÁÙ±N°ö¾i¶W¹L100¸U¦WSpark¼Æ¾Ú¬ì¾Ç®a©M¼Æ¾Ú¤u µ{®v¡C
¡@¡@2016¦~¡A¦b¦³¡§­pºâ¬É¶ø¹B·|¡¨¤§ºÙªº°ê»ÚµÛ¦WSort Benchmark ¥þ²y¼Æ¾Ú±Æ§Ç¤jÁɤ¤¡A¥Ñ«n¨Ê¤j¾Ç­pºâ¾÷¬ì¾Ç»P§Þ³N¨tPASA¤j¼Æ¾Ú¹êÅç«Ç¡Bªü¨½¤Ú¤Ú©MDatabricks¤½¥q²Õ¦¨ªº°ÑÁɹζ¤NADSort¡A¥H144¬ü¤¸ªº¦¨¥»§¹¦¨100TB¼Ð·Ç¼Æ¾Ú¶°ªº±Æ§Ç³B²z¡A³Ð¤U¤F¨CTB¼Æ¾Ú±Æ§Ç1.44¬ü¤¸¦¨¥»ªº³Ì·s¥@¬É¬ö¿ý¡A¤ñ2014¦~¹Ü±o«a­xªº¥[¦{¤j¾ÇÉo¦a¨È­ô¤À®ÕTritonSort¹Î¶¤¨CTB¼Æ¾Ú4.51¬ü¤¸ªº¦¨¥»­°§C¤Fªñ70%¡A¦Ó³o¦¸¤ñÁɨ̨ϥÎApache Spark¤j¼Æ¾Ú­pºâ¥­»O¡A¦b¤j³W¼Ò¦}¦æ±Æ§Çºâªk¥H¤ÎSpark¨t²Î©³¼h¶i¦æ¤F¤j¶qªºÀu¤Æ¡A¥HºÉ¥i¯à´£°ª±Æ§Ç­pºâ©Ê¯à¦}­°§C¦sÀx¸ê·½¶}¾P¡A½T«O³Ì²×űo¤ñÁÉ¡C
¡@¡@¦bFull Stack²z·Qªº«ü¤Þ¤U¡ASpark¤¤ªºSpark SQL¡BSparkStreaming¡BMLLib¡BGraphX¡BR¤­¤j¤l®Ø¬[©M®w¤§¶¡¥i¥HµLÁ_¦a¦@¨É¼Æ¾Ú©M¾Þ§@¡A³o¤£¶È¥´³y¤FSpark¦b·í¤µ¤j¼Æ¾Ú­pºâ»â°ì¨ä¥L­pºâ®Ø¬[³£µL¥i¤Ç¼ÄªºÀu¶Õ¡A¦Ó¥B¨Ï±oSpark¥¿¦b¥[³t¦¨¬°¤j¼Æ¾Ú³B²z¤¤¤ß­º¿ï³q¥Î­pºâ¥­»O¡A¦ÓSpark°Ó·~®×¨Ò©M©Ê¯àÀu¤Æ¥²±N¦¨¬°±µ¤U¨Óªº­«¤¤¤§­«¡I
¡@¡@¥»®Ñ®Ú¾Ú¤ý®aªL¦Ñ®v¿Ë±Â½Òµ{¤Îµ²¦X²³¦h¤j¼Æ¾Ú¶µ¥Ø¸gÅç½s¼g¦Ó¦¨¡A¨ä¤¤¤ý®aªL¡B¬q´¼µØ½s¼g¤F¥»®Ñªñ90%ªº¤º®e¡A¨ãÅé½s¼g³¹¸`¦p¤U¡G
¡@¡@²Ä3³¹ SparkªºÆF»î¡GRDD©MDataSet¡F
¡@¡@²Ä4³¹ Spark Driver±Ò°Ê¤º¹õ­åªR¡F
¡@¡@²Ä5³¹ Spark¶°¸s±Ò°Ê­ì²z©M·½½X¸Ô¸Ñ¡F
¡@¡@²Ä6³¹ Spark Application´£¥æµ¹¶°¸sªº­ì²z©M·½½X¸Ô¸Ñ¡F
¡@¡@²Ä7³¹ Shuffle­ì²z©M·½½X¸Ô¸Ñ¡F
¡@¡@²Ä8³¹ Job¤u§@­ì²z©M·½½X¸Ô¸Ñ¡F
¡@¡@²Ä9³¹ Spark¤¤Cache©Mcheckpoint­ì²z©M·½½X¸Ô¸Ñ¡F
¡@¡@²Ä10³¹ Spark¤¤Broadcast©MAccumulator­ì²z©M·½½X¸Ô¸Ñ¡F
¡@¡@²Ä11³¹ Spark»P¤j¼Æ¾Ú¨ä¥L¸g¨å²Õ¥ó¾ã¦X­ì²z»P¹ê¾Ô¡F
¡@¡@²Ä12³¹ Spark°Ó·~®×¨Ò¤§¤j¼Æ¾Ú¹q¼vÂIµû¨t²ÎÀ³¥Î®×¨Ò¡F
¡@¡@²Ä13³¹ Spark 2.2¹ê¾Ô¤§Dataset¶}µo¹ê¾Ô¥ø·~¤H­ûºÞ²z¨t²ÎÀ³¥Î®×¨Ò¡F
¡@¡@²Ä14³¹ Spark°Ó·~®×¨Ò¤§¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²ÎÀ³¥Î®×¨Ò¡F
¡@¡@²Ä15³¹ Spark°Ó·~®×¨Ò¤§NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR¨t²ÎÀ³¥Î®×¨Ò¡F
¡@¡@²Ä16³¹ ¹q°Ó¼s§iÂIÀ»¤j¼Æ¾Ú¹ê®É¬y³B²z¨t²Î®×¨Ò¡F
¡@¡@²Ä17³¹ Spark¦b³q«H¹BÀç°Ó¥Í²£Àô¹Ò¤¤ªºÀ³¥Î®×¨Ò¡F
¡@¡@²Ä18³¹ ¨Ï¥ÎSpark GraphX¹ê²{±BÅʪÀ¥æºôµ¸¦hºû«×¤ÀªR®×¨Ò¡F
¡@¡@²Ä23³¹Spark¶°¸s¤¤MapperºÝ¡BReducerºÝ¤º¦s½ÕÀu¡F
¡@¡@²Ä24³¹ ¨Ï¥ÎBroadcast¹ê²{MapperºÝShuffle»E¦X¥\¯àªº­ì²z©M½ÕÀu¹ê¾Ô¡F
¡@¡@²Ä25³¹ ¨Ï¥ÎAccumulator°ª®Ä¦a¹ê²{¤À¥¬¦¡¶°¸s¥þ§½­p¼Æ¾¹ªº­ì²z©M½ÕÀu®×¨Ò¡F
¡@¡@²Ä27³¹ Spark¤­¤j¤l®Ø¬[½ÕÀu³Ì¨Î¹ê½î¡F
¡@¡@²Ä28³¹ Spark 2.2.0·s¤@¥NÂëµ·­p¹ºÀu¤Æ¤ÞÀº¡F
¡@¡@²Ä30³¹ Spark©Ê¯à½ÕÀu¤§¼Æ¾Ú¶É±×½ÕÀu¤@¯¸¦¡¸Ñ¨M¤è®×­ì²z»P¹ê¾Ô¡F
¡@¡@²Ä31³¹ Spark¤j¼Æ¾Ú©Ê¯à½ÕÀu¹ê¾Ô±M·~¤§¸ô¡C
¡@¡@¨ä¤¤¡A¬q´¼µØ®Ú¾Ú¦Û¨­¦h¦~ªº¤j¼Æ¾Ú¤u§@¸gÅç¹ï¥»®Ñªº®×¨Òµ¥³¡¤À¶i¦æ¤FÂX®i¡C
¡@¡@°£¤W­z³¹¸`¥~¡A³Ñ§E¤º®e¥Ñ®L¶§¡B¾Gªö²Þ¡BØEùÚ°¶¤T¦ì§@ªÌ®Ú¾Ú¤ý®aªL¦Ñ®vªº¤j¼Æ¾Ú±Â½Ò¤º®e¦Ó§¹¦¨¡C
¡@¡@¦b¾\Ū¥»®Ñªº¹Lµ{¤¤¡A¦pµo²{¥ô¦ó°ÝÃD©Î¦³¥ô¦óºÃ°Ý¡A¥i¥H¥[¤J¥»®Ñªº¾\Ū¸s°Q½×¡A·|¦³±M¤HµªºÃ¡C¦P®É¡A¸Ó¸s¤]·|´£¨Ñ¥»®Ñ©Ò¥Î®×¨Ò·½½X¤Î¥»®Ñªº°t®M¾Ç²ßµøÀW¡C
¡@¡@¦pªGŪªÌ·Q­n¤F¸Ñ©ÎªÌ¾Ç²ß§ó¦h¤j¼Æ¾Ú¬ÛÃö§Þ³N¡A¥i¥HÃöª`DT¤j¼Æ¾Ú¹Ú¤u¼t·L«H¤½²³¸¹DT_Spark¡A¤]¥i¥H³q¹LYY«È¤áºÝµn¿ý68917580¥Ã¤[ÀW¹Dª½±µÅéÅç¡C
¡@¡@¤ý®aªL¦Ñ®vªº·s®ö·L³ÕÅwªï¤j®a¦b·L³Õ¤W»P§@ªÌ¶i¦æ ¤¬°Ê¡C
¡@¡@¥Ñ¤_®É¶¡­Ü«P¡A®Ñ¤¤Ãø§K¦s¦b¤£§´¤§³B¡A½ÐŪªÌ½Ì¸Ñ¡A¦}´£¥XÄ_¶Q·N¨£¡C
¡@¡@
¡@¡@¤ý®aªL2017¦~¤¤¬î¤§©]¤_¬ü°êÖº¨¦
¤º®e²¤¶¡G

¡mSpark¤j¼Æ¾Ú°Ó·~¹ê¾Ô¤T³¡¦±¡G¤º®Ö¸Ñ±K|°Ó·~®×¨Ò|©Ê¯à½ÕÀu¡n°ò¤_Spark 2.2.X¡A¥HSpark°Ó·~®×¨Ò¹ê¾Ô©MSpark¦b¥Í²£Àô¹Ò¤U´X¥G©Ò¦³Ãþ«¬ªº©Ê¯à½ÕÀu¬°®Ö¤ß¡A¥HSpark¤º®Ö¸Ñ±K¬°°ò¥Û¡A¤À¬°¤W½g¡B¤¤½g¡B¤U½g¡A¹ï¥ø·~¥Í²£Àô¹Ò¤UªºSpark°Ó·~®×¨Ò»P©Ê¯à½ÕÀu©âµ·­éõ¦a¶i¦æ­åªR¡C¤W½g°ò¤_Spark·½½X¡A±q¤@­Ó°Ê¤â¹ê¾Ô®×¨Ò¤J¤â¡A´`§Çº¥¶i¦a¥þ­±¸ÑªR¤FSpark 2.2·s¯S©Ê¤ÎSpark¤º®Ö·½½X¡F¤¤½g¿ï¨úSpark¶}µo¤¤³Ì¨ã¦³¥Nªíªº¸g¨å¾Ç²ß®×¨Ò¡A²`¤J²L¥X¦a¤¶²Ð¡A¦b®×¨Ò¤¤ºî¦XÀ³¥ÎSparkªº¤j¼Æ¾Ú§Þ³N¡F¤U½g©Ê¯à½ÕÀu¤º®e°ò¥»§¹¥þÂл\¤FSpark¦b¥Í²£Àô¹Ò¤Uªº©Ò¦³½ÕÀu§Þ³N¡C
¥Ø¿ý¡G

¤W½g¤º®Ö¸Ñ±K
²Ä1³¹¹q¥ú¥Û¤õ¶¡ÅéÅçSpark2.2¶}µo¹ê¾Ô2
1.1³q¹LRDD¹ê¾Ô¹q¼vÂIµû¨t²Î¤Jªù¤Î·½½X¾\Ū2
1.1.1Spark®Ö¤ß·§©À¹Ï¸Ñ2
1.1.2³q¹LRDD¹ê¾Ô¹q¼vÂIµû¨t²Î®×¨Ò4
1.2³q¹LDataFrame©MDataSet¹ê¾Ô¹q¼vÂIµû¨t²Î7
1.2.1³q¹LDataFrame¹ê¾Ô¹q¼vÂIµû¨t²Î®×¨Ò7
1.2.2³q¹LDataSet¹ê¾Ô¹q¼vÂIµû¨t²Î®×¨Ò10
1.3Spark2.2·½½X¾\ŪÀô¹Ò·f«Ø¤Î·½½X¾\ŪÅéÅç11
²Ä2³¹Spark2.2§Þ³N¤Î­ì²z14
2.1Spark2.2ºî­z14
2.1.1³sÄòÀ³¥Îµ{§Ç14
2.1.2·sªºAPI15
2.2Spark2.2Core16
2.2.1²Ä¤G¥NTungsten¤ÞÀº16
2.2.2SparkSession16
2.2.3²Ö¥[¾¹API17
2.3Spark2.2SQL19
2.3.1SparkSQL20
2.3.2DataFrame©MDatasetAPI20
2.3.3TimedWindow21
2.4Spark2.2Streaming21
2.4.1StructuredStreaming21
2.4.2¼W¶q¿é¥X¼Ò¦¡23
2.5Spark2.2MLlib27
2.5.1°ò¤_DataFrameªºMachineLearningAPI28
2.5.2Rªº¤À¥¬¦¡ºâªk28
2.6Spark2.2GraphX29
²Ä3³¹SparkªºÆF»î¡GRDD©MDataSet30
3.1¬°¤°¤\»¡RDD©MDataSet¬OSparkªºÆF»î30
3.1.1RDDªº©w¸q¤Î¤­¤j¯S©Ê­åªR30
3.1.2DataSetªº©w¸q¤Î¤º³¡¾÷¨î­åªR34
3.2RDD¼u©Ê¯S©Ê¤C­Ó¤è­±¸ÑªR36
3.3RDD¨Ì¿àÃö¨t43
3.3.1¯¶¨Ì¿à¸ÑªR43
3.3.2¼e¨Ì¿à¸ÑªR45
3.4¸ÑªRSpark¤¤ªºDAGÅÞ¿èµø¹Ï46
3.4.1DAG¥Í¦¨ªº¾÷¨î46
3.4.2DAGÅÞ¿èµø¹Ï¸ÑªR47
3.5RDD¤º³¡ªº­pºâ¾÷¨î49
3.5.1Task¸ÑªR49
3.5.2­pºâ¹Lµ{²`«×¸ÑªR49
3.6SparkRDD®e¿ù­ì²z¤Î¨ä¥|¤j®Ö¤ß­nÂI¸ÑªR57
3.6.1SparkRDD®e¿ù­ì²z57
3.6.2RDD®e¿ùªº¥|¤j®Ö¤ß­nÂI57
3.7SparkRDD¤¤Runtime¬yµ{¸ÑªR59
3.7.1Runtime¬[ºc¹Ï59
3.7.2¥Í©R©P´Á60
3.8³q¹LWordCount¹ê¾Ô¸ÑªRSparkRDD¤º³¡¾÷¨î70
3.8.1SparkWordCount°Ê¤â¹ê½î70
3.8.2¸ÑªRRDD¥Í¦¨ªº¤º³¡¾÷¨î72
3.9°ò¤_DataSetªº¥N½X¨ì©³¬O¦p¦ó¤@¨B¨BÂà¤Æ¦¨¬°RDDªº78
²Ä4³¹SparkDriver±Ò°Ê¤º¹õ­åªR81
4.1SparkDriverProgram­åªR81
4.1.1SparkDriverProgram81
4.1.2SparkContext²`«×­åªR81
4.1.3SparkContext·½½X¸ÑªR82
4.2DAGScheduler¸ÑªR96
4.2.1DAGªº©w¸q96
4.2.2DAGªº¹ê¨Ò¤Æ97
4.2.3DAGScheduler¹º¤ÀStageªº­ì²z98
4.2.4DAGScheduler¹º¤ÀStageªº¨ãÅéºâªk99
4.2.5Stage¤º³¡TaskÀò¨ú³Ì¨Î¦ì¸mªººâªk113
4.3TaskScheduler¸ÑªR116
4.3.1TaskScheduler­ì²z­åªR116
4.3.2TaskScheduler·½½X¸ÑªR117
4.4SchedulerBackend¸ÑªR132
4.4.1SchedulerBackend­ì²z­åªR132
4.4.2SchedulerBackend·½½X¸ÑªR132
4.4.3Sparkµ{§Çªºª`¥U¾÷¨î133
4.4.4Sparkµ{§Ç¹ï­pºâ¸ê·½ExecutorªººÞ²z134
4.5¥´³qSpark¨t²Î¹B¦æ¤º¹õ¾÷¨î´`Àô¬yµ{135
4.6¥»³¹Á`µ²145
²Ä5³¹Spark¶°¸s±Ò°Ê­ì²z©M·½½X¸Ô¸Ñ146
5.1Master±Ò°Ê­ì²z©M·½½X¸Ô¸Ñ146
5.1.1Master±Ò°Êªº­ì²z¸Ô¸Ñ146
5.1.2Master±Ò°Êªº·½½X¸Ô¸Ñ147
5.1.3MasterHAÂù¾÷¤Á´«157
5.1.4Masterªºª`¥U¾÷¨î©Mª¬ºAºÞ²z¸Ñ±K163
5.2Worker±Ò°Ê­ì²z©M·½½X¸Ô¸Ñ170
5.2.1Worker±Ò°Êªº­ì²z¬yµ{170
5.2.2Worker±Ò°Êªº·½½X¸Ô¸Ñ174
5.3ExecutorBackend±Ò°Ê­ì²z©M·½½X¸Ô¸Ñ178
5.3.1ExecutorBackend±µ¤f»PExecutorªºÃö¨t178
5.3.2ExecutorBackendªº¤£¦P¹ê²{179
5.3.3ExecutorBackend¤¤ªº³q«H181
5.3.4ExecutorBackendªº²§±`³B²z183
5.4Executor¤¤¥ô°Èªº°õ¦æ184
5.4.1Executor¤¤¥ô°Èªº¥[¸ü184
5.4.2Executor¤¤ªº¥ô°È½uµ{¦À185
5.4.3¥ô°È°õ¦æ¥¢±Ñ³B²z186
5.4.4´¦¯µTaskRunner188
5.5Executor°õ¦æµ²ªGªº³B²z¤è¦¡189
5.6¥»³¹Á`µ²197
²Ä6³¹SparkApplication´£¥æµ¹¶°¸sªº­ì²z©M·½½X¸Ô¸Ñ198
6.1SparkApplication¨ì©³¬O¦p¦ó´£¥æµ¹¶°¸sªº198
6.1.1Application´£¥æ°Ñ¼Æ°t¸m¸Ô¸Ñ198
6.1.2Application´£¥æµ¹¶°¸s­ì²z¸Ô¸Ñ199
6.1.3Application´£¥æµ¹¶°¸s·½½X¸Ô¸Ñ201
6.2SparkApplication¬O¦p¦ó¦V¶°¸s¥Ó½Ð¸ê·½ªº211
6.2.1Application¥Ó½Ð¸ê·½ªº¨âºØÃþ«¬¸Ô¸Ñ211
6.2.2Application¥Ó½Ð¸ê·½ªº·½½X¸Ô¸Ñ213
6.3±qApplication´£¥æªº¨¤«×­«·s¼fµøDriver219
6.3.1Driver¨ì©³¬O¤°¤\®É­Ô²£¥Íªº220
6.3.2Driver©MMaster¥æ¤¬­ì²z¸ÑªR238
6.3.3Driver©MMaster¥æ¤¬·½½X¸Ô¸Ñ244
6.4±qApplication´£¥æªº¨¤«×­«·s¼fµøExecutor249
6.4.1Executor¨ì©³¬O¤°¤\®É­Ô±Ò°Êªº249
6.4.2Executor¦p¦ó§âµ²ªG¥æµ¹Application254
6.5Spark1.6RPC¤º¹õ¸Ñ±K¡G¹B¦æ¾÷¨î¡B·½½X¸Ô¸Ñ¡BNetty»PAkkaµ¥254
6.6¥»³¹Á`µ²267
²Ä7³¹Shuffle­ì²z©M·½½X¸Ô¸Ñ268
7.1·§­z268
7.2Shuffleªº®Ø¬[269
7.2.1Shuffleªº®Ø¬[ºt¶i269
7.2.2Shuffleªº®Ø¬[¤º®Ö270
7.2.3Shuffle®Ø¬[ªº·½½X¸ÑªR272
7.2.4Shuffle¼Æ¾ÚŪ¼gªº·½½X¸ÑªR275
7.3HashBasedShuffle281
7.3.1·§­z281
7.3.2HashBasedShuffle¤º®Ö282
7.3.3HashBasedShuffle¼Æ¾ÚŪ¼gªº·½½X¸ÑªR285
7.4SortedBasedShuffle290
7.4.1·§­z292
7.4.2SortedBasedShuffle¤º®Ö293
7.4.3SortedBasedShuffle¼Æ¾ÚŪ¼gªº·½½X¸ÑªR294
7.5TungstenSortedBasedShuffle302
7.5.1·§­z302
7.5.2TungstenSortedBasedShuffle¤º®Ö302
7.5.3TungstenSortedBasedShuffle¼Æ¾ÚŪ¼gªº·½½X¸ÑªR303
7.6Shuffle»PStorage¼Ò¶ô¶¡ªº¥æ¤¬309
7.6.1Shuffleª`¥Uªº¥æ¤¬310
7.6.2Shuffle¼g¼Æ¾Úªº¥æ¤¬314
7.6.3ShuffleŪ¼Æ¾Úªº¥æ¤¬315
7.6.4BlockManager¬[ºc­ì²z¡B¹B¦æ¬yµ{¹Ï©M·½½X¸Ñ±K315
7.6.5BlockManager¸Ñ±K¶i¶¥¡GBlockManagerªì©l¤Æ©Mª`¥U¸Ñ±K¡BBlockManager-Master¤u§@¸Ñ±K¡BBlockTransferService¸Ñ±K¡B¥»¦a¼Æ¾ÚŪ¼g¸Ñ±K¡B»·µ{¼Æ¾ÚŪ¼g¸Ñ±K324
7.7¥»³¹Á`µ²341
²Ä8³¹Job¤u§@­ì²z©M·½½X¸Ô¸Ñ342
8.1Job¨ì©³¦b¤°¤\®É­Ô²£¥Í342
8.1.1IJµoJobªº­ì²z©M·½½X¸ÑªR342
8.1.2IJµoJobªººâ¤l®×¨Ò344
8.2Stage¹º¤À¤º¹õ345
8.2.1Stage¹º¤À­ì²z¸Ô¸Ñ345
8.2.2Stage¹º¤À·½½X¸Ô¸Ñ346
8.3Task¥þ¥Í©R©P´Á¸Ô¸Ñ346
8.3.1Taskªº¥Í©R¹Lµ{¸Ô¸Ñ347
8.3.2Task¦bDriver©MExecutor¤¤¥æ¤¬ªº¥þ¥Í©R©P´Á­ì²z©M·½½X¸Ô¸Ñ348
8.4ShuffleMapTask©MResultTask³B²zµ²ªG¬O¦p¦ó³QDriverºÞ²zªº364
8.4.1ShuffleMapTask°õ¦æµ²ªG©MDriverªº¥æ¤¬­ì²z¤Î·½½X¸Ô¸Ñ364
8.4.2ResultTask°õ¦æµ²ªG»PDriverªº¥æ¤¬­ì²z¤Î·½½X¸Ô¸Ñ370
²Ä9³¹Spark¤¤Cache©Mcheckpoint­ì²z©M·½½X¸Ô¸Ñ372
9.1Spark¤¤Cache­ì²z©M·½½X¸Ô¸Ñ372
9.1.1Spark¤¤Cache­ì²z¸Ô¸Ñ372
9.1.2Spark¤¤Cache·½½X¸Ô¸Ñ372
9.2Spark¤¤checkpoint­ì²z©M·½½X¸Ô¸Ñ381
9.2.1Spark¤¤checkpoint­ì²z¸Ô¸Ñ381
9.2.2Spark¤¤checkpoint·½½X¸Ô¸Ñ381
²Ä10³¹Spark¤¤Broadcast©MAccumulator­ì²z©M·½½X¸Ô¸Ñ391
10.1Spark¤¤Broadcast­ì²z©M·½½X¸Ô¸Ñ391
10.1.1Spark¤¤Broadcast­ì²z¸Ô¸Ñ391
10.1.2Spark¤¤Broadcast·½½X¸Ô¸Ñ393
10.2Spark¤¤Accumulator­ì²z©M·½½X¸Ô¸Ñ396
10.2.1Spark¤¤Accumulator­ì²z¸Ô¸Ñ396
10.2.2Spark¤¤Accumulator·½½X¸Ô¸Ñ396
²Ä11³¹Spark»P¤j¼Æ¾Ú¨ä¥L¸g¨å²Õ¥ó¾ã¦X­ì²z»P¹ê¾Ô399
11.1Spark²Õ¥óºî¦XÀ³¥Î399
11.2Spark»PAlluxio¾ã¦X­ì²z»P¹ê¾Ô400
11.2.1Spark»PAlluxio¾ã¦X­ì²z400
11.2.2Spark»PAlluxio¾ã¦X¹ê¾Ô401
11.3Spark»PJobServer¾ã¦X­ì²z»P¹ê¾Ô403
11.3.1Spark»PJobServer¾ã¦X­ì²z403
11.3.2Spark»PJobServer¾ã¦X¹ê¾Ô404
11.4Spark»PRedis¾ã¦X­ì²z»P¹ê¾Ô406
11.4.1Spark»PRedis¾ã¦X­ì²z406
11.4.2Spark»PRedis¾ã¦X¹ê¾Ô407
¤¤½g°Ó·~®×¨Ò
²Ä12³¹Spark°Ó·~®×¨Ò¤§¤j¼Æ¾Ú¹q¼vÂIµû¨t²ÎÀ³¥Î®×¨Ò412
12.1³q¹LRDD¹ê²{¤ÀªR¹q¼vªº¥Î¤á¦æ¬°«H®§412
12.1.1·f«ØIDEA¶}µoÀô¹Ò412
12.1.2¤j¼Æ¾Ú¹q¼vÂIµû¨t²Î¤¤¹q¼v¼Æ¾Ú»¡©ú425
12.1.3¹q¼vÂIµû¨t²Î¥Î¤á¦æ¬°¤ÀªR²Î­p¹ê¾Ô428
12.2³q¹LRDD¹ê²{¹q¼v¬y¦æ«×¤ÀªR431
12.3³q¹LRDD¤ÀªR¦UºØÃþ«¬ªº³Ì³ß·R¹q¼vTopN¤Î©Ê¯àÀu¤Æ§Þ¥©433
12.4³q¹LRDD¤ÀªR¹q¼vÂIµû¨t²Î¥éQQ©M·L«Hµ¥¥Î¤á¸s¤ÀªR¤Î¼s¼½
­I¦Z¾÷¨î¸Ñ±K436
12.5³q¹LRDD¤ÀªR¹q¼vÂIµû¨t²Î¹ê²{Java©MScalaª©¥»ªº¤G¦¸±Æ§Ç¨t²Î439
12.5.1¤G¦¸±Æ§Ç¦Û©w¸qKey­ÈÃþ¹ê²{¡]Java¡^440
12.5.2¹q¼vÂIµû¨t²Î¤G¦¸±Æ§Ç¥\¯à¹ê²{¡]Java¡^442
12.5.3¤G¦¸±Æ§Ç¦Û©w¸qKey­ÈÃþ¹ê²{¡]Scala¡^445
12.5.4¹q¼vÂIµû¨t²Î¤G¦¸±Æ§Ç¥\¯à¹ê²{¡]Scala¡^446
12.6³q¹LSparkSQL¤¤ªºSQL»y¥y¹ê²{¹q¼vÂIµû¨t²Î¥Î¤á¦æ¬°¤ÀªR447
12.7³q¹LSparkSQL¤Uªº¨âºØ¤£¦P¤è¦¡¹ê²{¤f¸O³Ì¨Î¹q¼v¤ÀªR451
12.8³q¹LSparkSQL¤Uªº¨âºØ¤£¦P¤è¦¡¹ê²{³Ì¬y¦æ¹q¼v¤ÀªR456
12.9³q¹LDataFrame¤ÀªR³Ì¨ü¨k©Ê©M¤k©Ê³ß·R¹q¼vTopN457
12.10¯Âºé³q¹LDataFrame¤ÀªR¹q¼vÂIµû¨t²Î¥éQQ©M·L«H¡B²^Ä_µ¥¥Î¤á¸s460
12.11¯Âºé³q¹LDataSet¹ï¹q¼vÂIµû¨t²Î¶i¦æ¬y¦æ«×©M¤£¦P¦~ÄÖ¶¥¬q¿³½ì¤ÀªRµ¥462
12.11.1³q¹LDataSet¹ê²{¬Y¯S©w¹q¼vÆ[¬ÝªÌ¤¤¨k©Ê©M¤k©Ê¤£¦P¦~ÄÖªº¤H¼Æ463
12.11.2³q¹LDataSet¤è¦¡­pºâ©Ò¦³¹q¼v¤¤¥­§¡±o¤À³Ì°ª
¡]¤f¸O³Ì¦n¡^ªº¹q¼vTopN464
12.11.3³q¹LDataSet¤è¦¡­pºâ©Ò¦³¹q¼v¤¤¯»µ·©ÎªÌÆ[¬Ý¤H¼Æ³Ì¦h¡]³Ì¬y¦æ¹q¼v¡^ªº¹q¼vTopN465
12.11.4¯Âºé³q¹LDataSetªº¤è¦¡¹ê²{©Ò¦³¹q¼v¤¤³Ì¨ü¨k©Ê¡B¤k©Ê³ß·Rªº
¹q¼vTop10466
12.11.5¯Âºé³q¹LDataSetªº¤è¦¡¹ê²{©Ò¦³¹q¼v¤¤QQ©ÎªÌ·L«H®Ö¤ß¥Ø¼Ð
¥Î¤á³Ì³ß·R¹q¼vTopN¤ÀªR467
12.11.6¯Âºé³q¹LDataSetªº¤è¦¡¹ê²{©Ò¦³¹q¼v¤¤²^Ä_®Ö¤ß¥Ø¼Ð¥Î¤á³Ì³ß·R¹q¼vTopN¤ÀªR469
12.12¤j¼Æ¾Ú¹q¼vÂIµû¨t²ÎÀ³¥Î®×¨Ò¯A¤Îªº®Ö¤ßª¾ÃÑÂI­ì²z¡B·½½X¤Î®×¨Ò¥N½X470
12.12.1ª¾ÃÑÂI¡G¼s¼½ÅܶqBroadcast¤º¹õ¾÷¨î470
12.12.2ª¾ÃÑÂI¡GSQL¥þ§½Á{®Éµø¹Ï¤ÎÁ{®Éµø¹Ï473
12.12.3¤j¼Æ¾Ú¹q¼vÂIµû¨t²ÎÀ³¥Î®×¨Ò§¹¾ã¥N½X474
12.13¥»³¹Á`µ²496
²Ä13³¹Spark2.2¹ê¾Ô¤§Dataset¶}µo¹ê¾Ô¥ø·~¤H­ûºÞ²z¨t²ÎÀ³¥Î®×¨Ò498
13.1¥ø·~¤H­ûºÞ²z¨t²ÎÀ³¥Î®×¨Ò·~°È»Ý¨D¤ÀªR498
13.2¥ø·~¤H­ûºÞ²z¨t²ÎÀ³¥Î®×¨Ò¼Æ¾Ú«Ø¼Ò499
13.3³q¹LSparkSession³Ð«Ø®×¨Ò¶}µo¹ê¾Ô¤W¤U¤åÀô¹Ò500
13.3.1Spark1.6.0ª©¥»SparkContext500
13.3.2Spark2.0.0ª©¥»SparkSession501
13.3.3DataFrame¡BDataSet­åªR»P¹ê¾Ô507
13.4³q¹Lmap¡BflatMap¡BmapPartitionsµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î510
13.5³q¹LdropDuplicate¡Bcoalesce¡Brepartitionµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î512
13.6³q¹Lsort¡Bjoin¡BjoinWithµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î514
13.7³q¹LrandomSplit¡Bsample¡Bselectµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î515
13.8³q¹LgroupBy¡Bagg¡Bcolµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î517
13.9³q¹Lcollect_list¡Bcollect_setµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î518
13.10³q¹Lavg¡Bsum¡BcountDistinctµ¥¤ÀªR¥ø·~¤H­ûºÞ²z¨t²Î519
13.11Dataset¶}µo¹ê¾Ô¥ø·~¤H­ûºÞ²z¨t²ÎÀ³¥Î®×¨Ò¥N½X519
13.12¥»³¹Á`µ²522
²Ä14³¹Spark°Ó·~®×¨Ò¤§¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²ÎÀ³¥Î®×¨Ò523
14.1¯Âºé³q¹LDataSet¶i¦æ¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²Î¤¤¯S©w®É¬q³X°Ý¦¸¼ÆTopN523
14.1.1¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²Î¼Æ¾Ú»¡©ú523
14.1.2¯S©w®É¬q¤º¥Î¤á³X°Ý¹q°Óºô¯¸±Æ¦WTopN525
14.2¯Âºé³q¹LDataSet¤ÀªR¯S©w®É¬qÁʶRª÷ÃBTop10©M³X°Ý¦¸¼Æ¼WªøTop10527
14.3¯Âºé³q¹LDataSet¶i¦æ¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²Î¤¤¦UºØÃþ«¬TopN¤ÀªR¹ê¾Ô¸Ô¸Ñ530
14.3.1²Î­p¯S©w®É¬qÁʶRª÷ÃB³Ì¦hªºTop5¥Î¤á530
14.3.2²Î­p¯S©w®É¬q³X°Ý¦¸¼Æ¼Wªø³Ì¦hªºTop5¥Î¤á530
14.3.3²Î­p¯S©w®É¬qÁʶRª÷ÃB¼Wªø³Ì¦hªºTop5¥Î¤á531
14.3.4²Î­p¯S©w®É¬qª`¥U¤§¦Z«e¨â©P¤º³X°Ý¦¸¼Æ³Ì¦hªºTop10¥Î¤á533
14.3.5²Î­p¯S©w®É¬qª`¥U¤§¦Z«e¨â©P¤ºÁʶRÁ`ÃB³Ì¦hªºTop10¥Î¤á534
14.4¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²ÎÀ³¥Î®×¨Ò¯A¤Îªº®Ö¤ßª¾ÃÑÂI­ì²z¡B·½½X¤Î®×¨Ò¥N½X535
14.4.1ª¾ÃÑÂI¡GFunctions.scala535
14.4.2¹q°Ó¥æ¤¬¦¡¤ÀªR¨t²ÎÀ³¥Î®×¨Ò§¹¾ã¥N½X548
14.5¥»³¹Á`µ²555
²Ä15³¹Spark°Ó·~®×¨Ò¤§NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR¨t²ÎÀ³¥Î®×¨Ò556
15.1NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR¨t²Î¬[ºc©M¹ê²{«ä¸ô556
15.2NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR¨t²Î¥N½X¹ê¾Ô¡G¼Æ¾Ú²M¬~©Mªì¨B³B²z561
15.3NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR¥N½X¹ê¾Ô¤§®Ö¤ß°ò¦¼Æ¾Ú¶µ½s¼g565
15.3.1NBA²y­û¼Æ¾Ú¨C¦~°ò¦¼Æ¾Ú¶µ°O¿ý565
15.3.2NBA²y­û¼Æ¾Ú¨C¦~¼Ð·Ç¤ÀZ-Score­pºâ567
15.3.3NBA²y­û¼Æ¾Ú¨C¦~Âk¤@¤Æ­pºâ568
15.3.4NBA¾ú¦~¤ñÁɼƾګö²y­û¤À²Õ²Î­p¤ÀªR572
15.3.5NBA²y­û¦~Ä֭ȤθgÅç­È¦CªíÀò¨ú575
15.3.6NBA²y­û¦~Ä֭ȤθgÅç­È²Î­p¤ÀªR576
15.3.7NBA²y­û¨t²Î¤º³¡©w¸qªº¨ç¼Æ¡B»²§U¤u¨ãÃþ578
15.4NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR§¹¾ã¥N½X´ú¸Õ©M¹ê¾Ô582
15.5NBAÄx²y¹B°Ê­û¤j¼Æ¾Ú¤ÀªR¨t²ÎÀ³¥Î®×¨Ò¯A¤Îªº®Ö¤ßª¾ÃÑÂI¡B­ì²z¡B·½½X594
15.5.1ª¾ÃÑÂI¡GStatCounter·½½X¤ÀªR594
15.5.2ª¾ÃÑÂI¡GStatCounterÀ³¥Î®×¨Ò598
15.6¥»³¹Á`µ²601
²Ä16³¹¹q°Ó¼s§iÂIÀ»¤j¼Æ¾Ú¹ê®É¬y³B²z¨t²Î®×¨Ò602
16.1¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò»Ý¨D¤ÀªR©M§Þ³N¬[ºc602
16.1.1¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò»Ý¨D¤ÀªR602
16.1.2¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò§Þ³N¬[ºc603
16.1.3¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¾ãÅ鳡¸p606
16.1.4¥Í²£¼Æ¾Ú·~°È¬yµ{¤Î®ø¶O¼Æ¾Ú·~°È¬yµ{607
16.1.5SparkJavaStreamingContextªì©l¤Æ¤Î±Ò°Ê607
16.1.6SparkStreaming¨Ï¥ÎNoReceivers¤è¦¡Åª¨úKafka¼Æ¾Ú¤ÎºÊ±±609
16.2¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¦b½uÂIÀ»²Î­p¹ê¾Ô612
16.3¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¶Â¦W³æ¹LÂo¹ê²{615
16.3.1°ò¤_¥Î¤á¼s§iÂIÀ»¼Æ¾Úªí¡A°ÊºA¹LÂo¶Â¦W³æ¥Î¤á616
16.3.2¶Â¦W³æªº¾ã­ÓRDD¶i¦æ¥h­«¾Þ§@617
16.3.3±N¶Â¦W³æ¼g¤J¨ì¶Â¦W³æ¼Æ¾Úªí618
16.4¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò©³¼h¼Æ¾Ú¼hªº«Ø¼Ò©M½s½X¹ê²{¡]°ò¤_MySQL¡^618
16.4.1¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¼Æ¾Ú®wÃì±µ³æ¨Ò¼Ò¦¡¹ê²{619
16.4.2¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¼Æ¾Ú®w¾Þ§@¹ê²{622
16.5¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò°ÊºA¶Â¦W³æ¹LÂo¯u¥¿ªº¹ê²{¥N½X624
16.5.1±q¼Æ¾Ú®w¤¤Àò¨ú¶Â¦W³æ«Ê¸Ë¦¨RDD624
16.5.2¶Â¦W³æRDD©M§å³B²zRDD¶i¦æ¥ªÃöÁp¡A¹LÂo±¼¶Â¦W³æ625
16.6°ÊºA¶Â¦W³æ°ò¤_¼Æ¾Ú®wMySQLªº¯u¥¿¾Þ§@¥N½X¹ê¾Ô627
16.6.1MySQL¼Æ¾Ú®w¾Þ§@ªº¬[ºc¤ÀªR627
16.6.2MySQL¼Æ¾Ú®w¾Þ§@ªº¥N½X¹ê¾Ô628
16.7³q¹LupdateStateByKeyµ¥¹ê²{¼s§iÂIÀ»¬y¶qªº¦b½u§ó·s²Î­p634
16.8¹ê²{¨C­Ó¬Ù¥÷ÂIÀ»±Æ¦WTop5¼s§i639
16.9¹ê²{¼s§iÂIÀ»TrendÁͶխpºâ¹ê¾Ô643
16.10¹ê¾Ô¼ÒÀÀÂIÀ»¼Æ¾Úªº¥Í¦¨©M¼Æ¾ÚªíSQLªº«Ø¥ß648
16.10.1¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¼ÒÀÀ¼Æ¾Úªº¥Í¦¨648
16.10.2¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¼Æ¾ÚªíSQLªº«Ø¥ß651
16.11¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¹B¦æµ²ªG654
16.11.1¹q°Ó¼s§iÂIÀ»ºî¦X®×¨ÒHadoop¶°¸s±Ò°Ê654
16.11.2¹q°Ó¼s§iÂIÀ»ºî¦X®×¨ÒSpark¶°¸s±Ò°Ê655
16.11.3¹q°Ó¼s§iÂIÀ»ºî¦X®×¨ÒZookeeper¶°¸s±Ò°Ê656
16.11.4¹q°Ó¼s§iÂIÀ»ºî¦X®×¨ÒKafka¶°¸s±Ò°Ê658
16.11.5¹q°Ó¼s§iÂIÀ»ºî¦X®×¨ÒHivemetastore¶°¸s±Ò°Ê660
16.11.6¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Òµ{§Ç¹B¦æ660
16.11.7¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò¹B¦æµ²ªG661
16.12¹q°Ó¼s§iÂIÀ»ºî¦X®×¨ÒScalaª©¥»Ãöª`ÂI663
16.13¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò½Òµ{ªºJava·½½X666
16.14¹q°Ó¼s§iÂIÀ»ºî¦X®×¨Ò½Òµ{ªºScala·½½X694
16.15¥»³¹Á`µ²711
²Ä17³¹Spark¦b³q«H¹BÀç°Ó¥Í²£Àô¹Ò¤¤ªºÀ³¥Î®×¨Ò712
17.1Spark¦b³q«H¹BÀç°Ó¿Ä¦X¤ä¥I¨t²Î¤é§Ó²Î­p¤ÀªR¤¤ªººî¦XÀ³¥Î®×¨Ò712
17.1.1¿Ä¦X¤ä¥I¨t²Î¤é§Ó²Î­p¤ÀªRºî¦X®×¨Ò»Ý¨D¤ÀªR712
17.1.2¿Ä¦X¤ä¥I¨t²Î¤é§Ó²Î­p¤ÀªR¼Æ¾Ú»¡©ú714
17.1.3¿Ä¦X¤ä¥I¨t²Î¤é§Ó²M¬~¤¤Scala¥¿«hªí¹F¦¡»P¼Ò¦¡¤Ç°tµ²¦Xªº
¥N½X¹ê¾Ô718
17.1.4¿Ä¦X¤ä¥I¨t²Î¤é§Ó¦b¤j¼Æ¾ÚSplunk¤¤ªº¥iµø¤Æ®i¥Ü722
17.1.5¿Ä¦X¤ä¥I¨t²Î¤é§Ó²Î­p¤ÀªR®×¨Ò¯A¤Îªº¥¿«hªí¹F¦¡ª¾ÃÑÂI
¤Î®×¨Ò¥N½X733
17.2Spark¦b¥ú¼e¥Î¤á¬y¶q¼ö¤O¤À¥¬GIS¨t²Î¤¤ªººî¦XÀ³¥Î®×¨Ò742
17.2.1¥ú¼e¥Î¤á¬y¶q¼ö¤O¤À¥¬GIS¨t²Î®×¨Ò»Ý¨D¤ÀªR742
17.2.2¥ú¼e¥Î¤á¬y¶q¼ö¤O¤À¥¬GISÀ³¥Îªº¼Æ¾Ú»¡©ú742
17.2.3¥ú¼e¥Î¤á¬y¶q¼ö¤O¤À¥¬GISÀ³¥ÎSpark¹ê¾Ô744
17.2.4¥ú¼e¥Î¤á¬y¶q¼ö¤O¤À¥¬GISÀ³¥ÎSpark¹ê¾Ô¦¨ªG748
17.2.5¥ú¼e¥Î¤á¬y¶q¼ö¤O¤À¥¬GISÀ³¥ÎSpark®×¨Ò¥N½X749
17.3¥»³¹Á`µ²752
²Ä18³¹¨Ï¥ÎSparkGraphX¹ê²{±BÅʪÀ¥æºôµ¸¦hºû«×¤ÀªR®×¨Ò753
§Ç¡G