1. Install prerequisites
a) JDK 1.7
Download it hereb) Scala 2.10.5
Download it here3) Maven latest version
Download it here4) SBT
Download it hereAfter install or unzip these components, configure them in the system variables, for example:
JAVA_HOME --> C:\Program Files\Java\jdk1.7.0_60
MAVEN_HOME --> D:\Program Files\Dev\apache-maven-3.2.3
SCALA_HOME --> C:\Program Files (x86)\scala
SBT_HOME --> C:\Program Files (x86)\sbt\
and append the following items in PATH
%JAVA_HOME%\bin; %MAVEN_HOME%\bin;%SCALA_HOME%\bin;%SBT_HOME%\bin;
2. Download and setup IntelliJ IDE
a) Download IntelliJ here
b) Download Scala plugin
Note: it's mandatory to enable HTTP proxy if you are in a company firewall.
We can copy a settings.xml file from %MAVEN_HOME%\conf\settings.xml, and when it's under proxy, please copy it to C:\Users\*****\.m2 , and enable proxy
After modification:
3. Create project
Let's create a sample project called HelloSpark1) File --> New --> Project..., please choose Scala
After click Finish, it will pop up a dialog, we can choose "New Window"
2). Add maven support
Right click project name and choose "Add Framework Support...", please scroll down and select "Maven"Double click pom.xml and add the following content with existing content of pom.xml
maven-compiler-plugin 3.1 1.7 1.7 org.apache.spark spark-core_2.10 1.4.0 provided org.apache.spark spark-mllib_2.10 1.4.0 provided org.apache.spark spark-sql_2.10 1.4.0 org.apache.spark spark-hive_2.10 1.4.0 org.apache.spark spark-streaming_2.10 1.4.0 provided org.apache.kafka kafka_2.10 0.8.1 provided org.apache.spark spark-streaming-kafka_2.10 1.4.0 Maven http://repo1.maven.org/maven2 clojars http://clojars.org/repo/ m2.java.net Java.net Maven 2 Repository http://download.java.net/maven/2 default
A sample pom.xml can be viewed here
After pasted the content, on the top right it will pop up a dialog, please choose Enable Auto-Import and maven will start downloading specified dependencies.
Or you can do it via
right click project name--> Maven --> Reimport
3) Create a folder for scala
expand project file structure, src--> main, right click main, New--> Directory,name it as Scala
Then add this new folder "Scala" to project source
File--> Project Structure (shortcut Ctrl+Alt+Shift+S)
Modules--> scala -->Source , and as the screenshot shows, Click 1, 2 and 3, the result will display 4.
4) Create a scala class
Right click scala folder, new Scala classAdd modify the content as the screenshot.
Also please be noted
1) org.apache.spark.SparkContext need be imported.
2) create a file called pagecounts,
3) This program is to read the content from a file named pagecounts, and then print out the first 10 lines, and also print out the total line counts of this file.
You can put arbitrary content in pagecounts, a sample file can be viewed here. If you place in another folder, please modify the file path accordingly.
5) Add Spark jar file
We need to download and Spark latest package and unzip itGo to: https://spark.apache.org/downloads.html
Downoad spark package, you can choose 2.4 or 2.6 based on your requirement. For example, a sampe spark-1.4.0-bin-hadoop2.4.tgz can be downloaded here.
After unzip, we can add the package in our project, click OK with the popup.
6) Set run configuration
in the IntelliJ menu, Run-->Edit Configuration, please choose Application and set up the content as the screenshot belowFinal: Run it!
Click the run button on the toolbar, and the result is good!Please note that in the beginning it will display SLF4J multiple binding problem and Winutil problem like java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. These can be ignored for now.
Happy Spark!
