HDFS hflush vs hsync

hflush:  This API flushes all outstanding data (i.e. the current unfinished packet) from the client into the OS buffers on all DataNode replicas.

hsync: This API flushes the data to the DataNodes, like hflush(), but should also force the data to underlying physical storage via fsync (or equivalent). Note that only the current block is flushed to the disk device.

[1] https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOutputStream.java

9 Ağustos 2016

Posted In: dfsoutputstream, hadoop, hdfs, hflush, hsync

How to build Hadoop Native Library with Snappy Compression Support

Snappy is a compression library that can be utilized by the native code.
It is currently an optional component, meaning that Hadoop can be built with
or without this dependency.

Download and compile snappy codecs. or you can install from your distro repo. I installed libsnappy and libsnappy-dev packages from Ubuntu repo. If everything is fine you can use -Drequire.snappy to fail the build if libsnappy.so is not found. If this option is not specified and the snappy library is missing,silently build a version of libhadoop.so that cannot make use of snappy. After than You just need to enter below command:

mvn clean package -Pdist,native -DskipTests -Dtar -Drequire.snappy

If you build snappy and It is located different place you can use this parameters

  • -Dsnappy.prefix to specify a nonstandard location for the libsnappy header files and library files. You do not need this option if you have installed snappy using a package manager.
  • -Dsnappy.lib to specify a nonstandard location for the libsnappy library files. Similarly to snappy.prefix, you do not need this option if you have installed snappy using a package manager.
  • -Dbundle.snappy to copy the contents of the snappy.lib directory into the final tar file. This option requires that -Dsnappy.lib is also given, and it ignores the -Dsnappy.prefix option.

After compiling finished you can find your native libraries 

<source_folder>/hadoop-dist/target/hadoop-2.5.2/lib/native/

Good luck

13 Temmuz 2016

Posted In: hadoop, hadoop-native, snappy

Hadoop Application Master Container can not able to initialize user directory when your map reduce code is submitted by hdfs user

Today our Cloudera CDH 5.3 cluster throw below error when we try to submit our job from hdfs user. 

15/03/16 08:28:18 INFO mapreduce.Job: Job job_1426239544674_0019 running in uber mode : false

15/03/16 08:28:18 INFO mapreduce.Job:  map 0% reduce 0%
15/03/16 08:28:18 INFO mapreduce.Job: Job job_1426239544674_0019 failed with state FAILED due to: Application application_1426239544674_0019 failed 2 times due to AM Container for appattempt_1426239544674_0019_000002 exited with  exitCode: -1000 due to: Not able to initialize user directories in any of the configured local directories for user hdfs
.Failing this attempt.. Failing the application.

I made some research defining the problem. main reason of the problem is deletion of local directories on YARN startup, but in practice it fails to delete the directories because of permission problems. The top-level usercache directory is owned by the user but is in a directory that is not writable by the user. Therefore the deletion of the user’s usercache directory, as the user, fails due to lack of permissions.

For solution,

You should delete your usercache directory which that located in data node directory. 

rm -rf /dn/yarn/nm/usercache/*

16 Mart 2015

Posted In: applicationmaster, cdh, cloudera, hadoop, hdfs, mapreduce, yarn

WP Twitter Auto Publish Powered By : XYZScripts.com