2019-03-24Home
As we mentioned in the last post, Spark had a bug checking whether an application has extended scala.App
(SPARK-26977). I went on to submit a pull request checking the existence of childMainClass$
on spark-submit --class childMainClass
.
Try {
if (classOf[scala.App].isAssignableFrom(Utils.classForName(s"$childMainClass$$"))) {
logWarning("Subclasses of scala.App may not work correctly. " +
"Use a main() method instead.")
}
}
// invoke main method of childMainClass
This was a minor issue but since it's my first PR to Spark I was quite happy about it. I didn't realize then when fixing one minor Spark bug I created a major one.
That's SPARK-27205, where launching spark-shell
with --packages
option failed to load transitive dependencies on master.
./bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0
Weirdly, it would work after removing my changes. The two looked irrelevant so I dug into how spark-shell
load dependencies behind the scene.
spark-shell
is actually a shorthand for spark-submit --class org.apache.spark.repl.Main
and repl.Main loads user specified jars or packages from spark.repl.local.jars
option of SparkConf
. The option is set with --jars
or --packages
from SparkSubmit
. I verified the option existed with user packages at SparkSubmit
's side so the issue was that somehow SparkConf
didn't get to repl.Main
.
It turns out that all options of SparkConf
are set to system properties right before invoking the main method of repl.Main
and loaded back into SparkConf
at initialization of repl.Main
. The latter has to happen after the former while "my fix" broke it !
Okay, I have been bitten twice by fields initialization of Scala Object.