One problem of using SVN is that you might meet conflicts and end up losing your own codes. I met the problem once: I did an update, and there are conflicts. I tried to revert the update but found out that the revert is reverting my local change but keep the update! Now I lost my local development work...
Luckily I have a compiled jar file which I build last time, but they are all .class files. How to view the original files?
After install or unzip these components, configure them in the system variables, for example:
JAVA_HOME --> C:\Program Files\Java\jdk1.7.0_60
MAVEN_HOME --> D:\Program Files\Dev\apache-maven-3.2.3
SCALA_HOME --> C:\Program Files (x86)\scala
SBT_HOME --> C:\Program Files (x86)\sbt\
and append the following items in PATH %JAVA_HOME%\bin; %MAVEN_HOME%\bin;%SCALA_HOME%\bin;%SBT_HOME%\bin;
Note: it's mandatory to enable HTTP proxy if you are in a company firewall.
c) Configure Maven settings if under proxy
We can copy a settings.xml file from %MAVEN_HOME%\conf\settings.xml, and when it's under proxy, please copy it to C:\Users\*****\.m2 , and enable proxy
After modification:
3. Create project
Let's create a sample project called HelloSpark
1) File --> New --> Project..., please choose Scala
After click Finish, it will pop up a dialog, we can choose "New Window"
2). Add maven support
Right click project name and choose "Add Framework Support...", please scroll down and select "Maven"
Double click pom.xml and add the following content with existing content of pom.xml
After pasted the content, on the top right it will pop up a dialog, please choose Enable Auto-Import and maven will start downloading specified dependencies.
Or you can do it via
right click project name--> Maven --> Reimport
3) Create a folder for scala
expand project file structure, src--> main, right click main, New--> Directory,
name it as Scala
Then add this new folder "Scala" to project source
File--> Project Structure (shortcut Ctrl+Alt+Shift+S)
Modules--> scala -->Source , and as the screenshot shows, Click 1, 2 and 3, the result will display 4.
4) Create a scala class
Right click scala folder, new Scala class
Add modify the content as the screenshot.
Also please be noted
1) org.apache.spark.SparkContext need be imported.
2) create a file called pagecounts,
3) This program is to read the content from a file named pagecounts, and then print out the first 10 lines, and also print out the total line counts of this file.
You can put arbitrary content in pagecounts, a sample file can be viewed here. If you place in another folder, please modify the file path accordingly.
5) Add Spark jar file
We need to download and Spark latest package and unzip it
Go to: https://spark.apache.org/downloads.html
Downoad spark package, you can choose 2.4 or 2.6 based on your requirement. For example, a sampe spark-1.4.0-bin-hadoop2.4.tgz can be downloaded here.
After unzip, we can add the package in our project, click OK with the popup.
6) Set run configuration
in the IntelliJ menu, Run-->Edit Configuration, please choose Application and set up the content as the screenshot below
Final: Run it!
Click the run button on the toolbar, and the result is good!
Please note that in the beginning it will display SLF4J multiple binding problem and Winutil problem like java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. These can be ignored for now.
Happy Spark!
If you think this article is useful, please click the ads on this page to help. Thank you very much.