I know, I have lived it for 12 months now. If you have data that needs to be aggregated, rolled up, analyzed across rows then consider Hive. But if you want to know how many integer keys in Hbase are between 10000000 that is not suitable for Hbase alone. If you know the key, you can instantly get the value. Put all these steps into an Oozie workflow - scheduled with Daily Oozie Coordinator.Create Hive query scripts (call it HQL if you like as diff from SQL) that in turn ran MR jobs in the background and generated aggregation data.Create Hive tables with partitions and locations pointing to HDFS locations.MR jobs parsed these log files and output files in HDFS.Daily log Files were transported to HDFS.Hive+HBase - queries were too slow so I dumped this option.Build Custom reports thru queries in Hive.To replace daily aggregation data generated thru MySQL with Hive. To have the daily web log files collected from 350+ servers daily queryable thru some SQL like language.I implemented a Hive Data platform recently in my firm and can speak to it in first person since I was a one man team.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |