To learn more about Hive UDF (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF)
I recently developed a bunch of hive UDF’s and to call the function I have to add the jar files and create the temporary functions for every hive session. I started digging in to the code to find out If I can modify the java files and then rebuild hive. This post describes how I did it.
1. Download the source code from
0.10.0 is the most stable version of hive when I’m writing this blog post.
2. Extract the tar ball and copy your udf java files to
if the udf is a generic udf copy to it
3. Before you copy the java files to the ql folder you have to change the package of the java files to
package org.apache.hadoop.hive.ql.udf.generic; or package org.apache.hadoop.hive.ql.udf;
4. Now that you copied files to the udf folder you have to tell hive on how to find these functions. To do this you have change FunctionRegistry.java. You can find FunctionRegistry.java in
5. You have to make the following changes to FunctionRegistry.java
i) import the udf class
ex: import org.apache.hadoop.hive.ql.udf.generic.GenericUDAFToMap;
ii) if its a UDAF
registerGenericUDAF("to_map", new GenericUDAFToMap());
6. Once you have made all the changes navigate to the src folder and build using ant ( ant package) once its build you will have a build folder. Even if the build fails (To completely build hive you need thrift compiler and many others installed, as long as it build hive-exec-*.jar you are good). It is recommended to deploy the entire build but only the hive-exec-*.jar needs to be replaced.
I also observed that when I build hive on the windows machine and copy the jar to centos box it fails.
Hope this helps some one. Let me know if you have any trouble. You can reach me at @abhishek376 on twitter