Sai's World: 2015

Sep 7, 2015

Strange spark problem: "Error:scalac: error while loading ..., Missing dependency bad symbolic reference..."

I am very new to spark and try to develop it on Windows using IntelliJ. This kind environment is not typical environment for developing spark because normally people use Ubuntu + IntelliJ.

I copied a famous example from spark official example SparkPi, when run it in idea, it pops an error:

I did a lot of search but no luck there. Finally I find out that the problem is caused by my pom.xml settings.

 

    org.apache.spark
    spark-core_2.10
    1.4.1
    ${myscope}

The reason of doing this is because we don't to package spark-core jar file in our finally package, but when specify it as provided it cannot run locally. The details can be check in another post of mine:
IntelliJ "Provided" Scope Problem

The solution is that:
1. keep the ${myscope} value here
2. add pom.xml a variable called myscope


      compile

3. In maven part, keep the setting
clean install -Dmyscope=provided

Now the spark object can run successfully and also package as we expected.

If you think this article is useful, please click the ads on this page to help. Thank you very much.

Aug 29, 2015

GitHub for Windows through Company Proxy

GitHub is a very famous tool but I am just starting to use it... Shy... Because we come from ancient times in source codes management: ClearCase, SVN, (TeamForge), now it's time to embrace GitHub.

I have downloaded a windows version. GitHub for Windows... Maybe the best way is to use the command line which I will investigate later.

This is the tool GitHub for Windows, which can be obtained here: https://git-scm.com/download/win

The problem is that we are under company proxy and there is no option to change it. Luckily there is a solution.

1. Open the file
C:\Users\YOURNAME\.gitconfig

2. Add there two lines
[http]
proxy = http://YOUR_COMPANY_PROXY:8080

[https]
proxy = http://YOUR_CONPANY_PROXY:8080

OK, all set!

By I feel using command line is more convenient.

1. clone

git clone REPOSITORY URL
for example

2. update
1) cd to the folder for example Spark
2) git pull

For the turorial of Git, we can check the great website, many thanks to the author!
http://rogerdudler.github.io/git-guide/

If you think this article is useful, please click the ads on this page to help. Thank you very much.

Aug 11, 2015

IntelliJ "Provided" Scope Problem

IntelliJ is such a famous tool that lots of users saying that once you get used to it, you can never come back. To a Eclipse user, there is some learning curves to use IntelliJ. Today I met a interesting problem that exist only in IntelliJ.

The background is that I moved my project from Eclipse to IntelliJ, the project runs very well in eclipse but I used the same way to run in IntelliJ (short as idea), it throws the error like below:

I searched a lot and finally find that it's because the storm package in pom.xml is specified as provided.


 org.apache.storm
 storm-core
 0.10.0-beta1
 provided

The reason the package must be specified as provided is that the storm cluster already contains the storm jar file, if our package contains the storm jar file again, it will cause conflicts.

But according to the definition of provided from maven website: https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html

provided
This is much like compile, but indicates you expect the JDK or a container to provide the dependency at runtime. For example, when building a web application for the Java Enterprise Edition, you would set the dependency on the Servlet API and related Java EE APIs to scope provided because the web container provides those classes. This scope is only available on the compilation and test classpath, and is not transitive.

This is why there is no problem when compile but has problems during runtime.

Here we come to a dilemma: when run it in IDE (intelliJ) we need to scope to be compile, but when package it to deploy on cluster, we need it to be package scope. One way to solve this is each time we change the scope, but this is very annoying.

So here is the solution:

1. Set the scope to be a parameter, such as ${myscope}


 org.apache.storm
 storm-core
 0.10.0-beta1
 ${myscope}

2. create a mvn task
expand Lifecycle, right click an item for example install, and choose create Practise...

then specify command line: clean install -Dmyscope=provided

With this, it's running very well and also can package with provided scope!

If you think this article is useful, please click the ads on this page to help. Thank you very much.

Jul 31, 2015

Good tool Java Decompiler - View source codes from jar or .class file

One problem of using SVN is that you might meet conflicts and end up losing your own codes. I met the problem once: I did an update, and there are conflicts. I tried to revert the update but found out that the revert is reverting my local change but keep the update! Now I lost my local development work...

Luckily I have a compiled jar file which I build last time, but they are all .class files. How to view the original files?

Java Decompiler comes to the resue!
http://jd.benow.ca/

Use this tool to open a jar file or .class file, you will see all the contents:
It's structured and clear! Thanks to the author!

If you think this article is useful, please click the ads on this page to help. Thank you very much.

Jul 14, 2015

Hello Spark - How to configure Spark development environment on Windows with IntelliJ

1. Install prerequisites

a) JDK 1.7

Download it here

b) Scala 2.10.5

Download it here

3) Maven latest version

Download it here

4) SBT

Download it here

After install or unzip these components, configure them in the system variables, for example:
JAVA_HOME --> C:\Program Files\Java\jdk1.7.0_60
MAVEN_HOME --> D:\Program Files\Dev\apache-maven-3.2.3
SCALA_HOME --> C:\Program Files (x86)\scala
SBT_HOME --> C:\Program Files (x86)\sbt\

and append the following items in PATH
%JAVA_HOME%\bin; %MAVEN_HOME%\bin;%SCALA_HOME%\bin;%SBT_HOME%\bin;

2. Download and setup IntelliJ IDE

a) Download IntelliJ here

b) Download Scala plugin

Note: it's mandatory to enable HTTP proxy if you are in a company firewall.

c) Configure Maven settings if under proxy

We can copy a settings.xml file from %MAVEN_HOME%\conf\settings.xml, and when it's under proxy, please copy it to C:\Users\*****\.m2 , and enable proxy

After modification:

3. Create project

Let's create a sample project called HelloSpark

1) File --> New --> Project..., please choose Scala

After click Finish, it will pop up a dialog, we can choose "New Window"

2). Add maven support

Right click project name and choose "Add Framework Support...", please scroll down and select "Maven"

Double click pom.xml and add the following content with existing content of pom.xml

    
        
        
            maven-compiler-plugin
            3.1
            
                1.7
                1.7
            
        
        
    

       
        
        
            org.apache.spark
            spark-core_2.10
            1.4.0
            provided
        
        
        
            org.apache.spark
            spark-mllib_2.10
            1.4.0
            provided
        

        
        
            org.apache.spark
            spark-sql_2.10
            1.4.0
        

        
        
            org.apache.spark
            spark-hive_2.10
            1.4.0
        

        
        
            org.apache.spark
            spark-streaming_2.10
            1.4.0
            provided
        
        
            org.apache.kafka
            kafka_2.10
            0.8.1
            provided
        
        
            org.apache.spark
            spark-streaming-kafka_2.10
            1.4.0
        
    
 
    
        
            Maven
            http://repo1.maven.org/maven2
        
        
            clojars
            http://clojars.org/repo/
        
        
            m2.java.net
            Java.net Maven 2 Repository
            http://download.java.net/maven/2
            default

Hello
A sample pom.xml can be viewed here

After pasted the content, on the top right it will pop up a dialog, please choose Enable Auto-Import and maven will start downloading specified dependencies.

Or you can do it via
right click project name--> Maven --> Reimport

3) Create a folder for scala

expand project file structure, src--> main, right click main, New--> Directory,
name it as Scala

Then add this new folder "Scala" to project source
File--> Project Structure (shortcut Ctrl+Alt+Shift+S)

Modules--> scala -->Source , and as the screenshot shows, Click 1, 2 and 3, the result will display 4.

4) Create a scala class

Right click scala folder, new Scala class

Add modify the content as the screenshot.
Also please be noted
1) org.apache.spark.SparkContext need be imported.
2) create a file called pagecounts,
3) This program is to read the content from a file named pagecounts, and then print out the first 10 lines, and also print out the total line counts of this file.
You can put arbitrary content in pagecounts, a sample file can be viewed here. If you place in another folder, please modify the file path accordingly.

5) Add Spark jar file

We need to download and Spark latest package and unzip it
Go to: https://spark.apache.org/downloads.html
Downoad spark package, you can choose 2.4 or 2.6 based on your requirement. For example, a sampe spark-1.4.0-bin-hadoop2.4.tgz can be downloaded here.

After unzip, we can add the package in our project, click OK with the popup.

6) Set run configuration

in the IntelliJ menu, Run-->Edit Configuration, please choose Application and set up the content as the screenshot below

Final: Run it!

Click the run button on the toolbar, and the result is good!

Please note that in the beginning it will display SLF4J multiple binding problem and Winutil problem like java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries. These can be ignored for now.

Happy Spark!

If you think this article is useful, please click the ads on this page to help. Thank you very much.

May 2, 2015

Nexus 5 OTA after rooted and TWRP installed

I have been using Nexus 5 for quite a long time, and it's great to frequently receive update notification from google to get me the latest version of Android OS. We often joke that as a Nexus user, we paid for the system and get the device for free.

The update is called OTA (Over The Air). But since google changed its policy of verifying whether an OTA can perform or not, our lives become hard. In short, most of Nexus users will root the device, unlock bootloader and flash third party recovery for instance TWRP. (With this, you have most of controls of the Android OS and install lots of applications.) After done these, the system files will be changed. Google now changed the OTA policy to not only check some of the files are original version but check the whole system partition, which means in our case the OTA check will definitely fail and cannot run the update.

Below is a sample error:
expects build fingerprint of google/hammerhead/hammerhead 5.0.1/LRX22C/160258:user/release-keys or google/hammerhead/hammerhead 5.1 LMY47D/1743759 user/release-keys
this device has Android/omni_hammerhead/hammerhead:4.4.4/KTU84P/4ef4c299f5:eng/test-keys

When what to do?

I had a lot of pain trying different methods, each time failed by the phone cannot boot and I have to reinstall the whole factory image, which means I have a complete fresh OS and have to install all the applications! So each update I need to spend days to get my phone ready, this is really hard.

During this Labors' Day vacation, I decide to study this problem and luckily I found a solution.

First of all,
Don't try the sideload, don't try to use TWRP to install... I have tried all these and they failed. :)

Solution:

1. Download Nexus Root Toolkit and update it to the latest version
2. click

3. Very importantly, very importantly, very importantly CHECK don't flash userdata

Then follow all the instructions to flash, after this is done, phone will boot and updates all the application. Last but not least, flash the latest custom recovery and the latest Super

Enjoy the latest Android!

If you find this article is useful, please click the ads on this page to help. Thank you very much.

Apr 23, 2015

Apache Commons CLI not working? No, it's a junk character problem

I am using Apache command line interface to accept run time parameters.

While in my code, I am using -t to specify run time topology name.
But to my surprise, it didn't work. I took 2 hours to debug this problem and finally by chance I compare the result and find our that

It's complaining about the - character. What is wrong with it?

I used Notepad++ and there is good plugin called HexEditor and it gives our the answer

This is causing strange problem.

You may ask why. This is because when someone sent me the command from Outlook, it was already junked.

OK. the lesson and learn here is to be careful of the characters you pasted from Outlook, especially when it's in Chinese environment. We may try to type the command again to avoid problems.

If you find this article is useful, please click the ads on this page to help. Thank you very much.

Mar 23, 2015

Strange Supervisorctl Problem: unix:///tmp/supervisor.sock no such file

I am using supevisord which is a great tool to spawn processes

http://supervisord.org/running.html

When I get things configured, I run supervisor with the command below:

supervisord -c /etc/supervisord.conf

Supervisor runs successfully and I can see programs got started, but when I check processes using

supervisorctl status

I got an strange error as below:

I tried several solutions for example change supervisord.conf ot use http socket

It even reported an refuse connection error.
supervisorctl status http://127.0.0.1:9001 refused connection.

I searched a lot and finally got the solution.

Problem:
I was using a old version of supervisord.conf file, people who work on clusters and do maintenance work often do this. But the problem happend because I did a fresh install of supervisord, and copied a supervisord.conf from another machine. The install is the latest version, and the copied one is old, which caused the problem.

The solution is very simple:

echo_supervisord_conf > /etc/supervisord.conf
#modify supervisord.conf, after it's done
sudo supervisord -c /etc/supervisord.conf
sudo supervisorctl status

The first line generates a new supervisord.conf which is a template, and then you can modify with the programs you want to start. And then start supervisord and check status. The result is good.

If you find this article is useful, please click the ads on this page to help. Thank you very much.

Mar 20, 2015

Storm slf4j multiple binding problem

We are using Storm 0.9.3 and when we try to start supervisor node we found it reported the error below:

Clearly this is a slf4j multiple binding problem. Basically this famous library is used in different jars, when there are multiple slf4j find in the path only one will be picked up so that some jars called in a different version of slf4j will cause these compatibility problems.

For the log above it clear states that there are 3 slf4j bindings under lib

slf4j-jdk14-1.5.6.jar
slf4j-nop-1.5.3.jar
logback-classic-1.0.13.jar

And it needs 1.6 above as it said

SLF4J: slf4j-api 1.6.x (or later) is incompatible with this binding.

SLF4J: Your binding is version 1.5.5 or earlier.

SLF4J: Upgrade your binding to version 1.6.x.

Storm is compiled and tested, why it will have multiple slf4j* jars? The actual result is Storm vanilla package is good, but we add some jars to its lib folder:

So that when we package our Storm jar, we don't need each time to include all the packages it required, this is to save time: we develop on windows and upload developed package jar into cluster, if we can have a smaller generated jar file we save our time.

Let's not judge whether this method is good or not, but the problem here is to solve the binding problem.

The solution is simple:

From eclipse dependency

Let's remove these two jars which are slf4j 1.5.*

slf4j-jdk14-1.5.6.jar
slf4j-nop-1.5.3.jar

It can successfully launch now.

Then here comes another question: when we package Storm application, we want solve the slf4j multiple binding problem?

There is a nice pom.xml for you to reference (partial of pom.xml), this is what I searched and copied from internet, honor and thanks go to the original author.


 org.codehaus.gmaven
 gmaven-plugin
 1.5
 
  
   package
   
    execute
   
   
    
    File targetDir = new
    File("${project.basedir.path}/target".toString())
    println "dir is ${targetDir.path}"
    String jarBaseName = "${project.artifactId}-${project.version}"
    File jarWithUnwantedStuff = new File(targetDir,
    "${jarBaseName}-jar-with-dependencies.jar".toString())

    def explodedJarDir = new File(targetDir, "explodedJar".toString())
    def ant = new AntBuilder() // create an antbuilder
    ant.unzip(src: "${jarWithUnwantedStuff.path}",
    dest: explodedJarDir.path,
    overwrite: "false")
    File finalJar = new File(targetDir, "${jarBaseName}-deployable.jar")
    unwantedClassesDir = new File(explodedJarDir,
    "/org/slf4j/impl".toString())
    unwantedClassesDir.deleteDir()
    ant.zip(basedir: explodedJarDir.path, destFile: finalJar.path)

If you find this blog is useful, please kindly click the ads on this page to help. Thank you very much.

Feb 15, 2015

Anroid Emulator Genymotion Connect to Internet

We have attended an Android training class and the lecturer told us that Genymotion is so far the best Android emulator.

https://www.genymotion.com/

I have setup a device as shown above: Nexus 5 and it can successfully run. But the problem is, when I am in company, I cannot access the network. But you can see on the top the WiFi icon shows connection is successful.