100

So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much cooler.

My only minor issue is I'm a C# developer and it's in Java.

It's not that I don't understand the Java as much as I'm looking for the Hadoop.net or NHadoop or the .NET project that embraces the Google MapReduce approach. Does anyone know of one?

j0k
  • 22,600
  • 28
  • 79
  • 90
danswain
  • 4,171
  • 5
  • 37
  • 43

15 Answers15

57

Have you looked at using Hadoop's streaming?

I use it in python all the time :-).

I'm starting to see that the heterogeneous approach is often the best and it looks like other folks are doing the same.

If you look at projects like protocol-buffers or facebook's thrift you see that sometimes it's just best to use an app written in another language and build the glue in the language of your preference.

chews
  • 2,579
  • 2
  • 20
  • 10
  • 2
    This is not technicalling having Hadoop in C#, streaming decouple the processes and the data are passed as strings, that could not be so efficient. – Felice Pollano Mar 03 '16 at 07:53
14

Recently, MySpace released their .NET MapReduce framework, Qizmt, as Open Source, so this is also a potential contender in this space.

foxxtrot
  • 11,214
  • 4
  • 27
  • 27
  • 3
    Their license is GPL ;( Would be great if they've chosen something less restrictive... – IgorK Apr 14 '10 at 09:43
  • 3
    It's really unlikely the GPL will get in your way in this case. As long as you're not distributing your modifications to the source (if you've made any) outside of your organization, you won't be required to release any of your code. – foxxtrot Aug 15 '11 at 15:51
  • We distribute our closed-source product (as a product company). And if we tried to rely on GPL'ed software component then we automatically need to distribute our sources as well, it's not LGPL where including a library in closed-source project is OK :( – IgorK Aug 15 '11 at 16:51
  • Completely fair. I just think that *most people's* use of a Map-Reduce framework won't have this limitation. That said, I don't understand MySpace's business case for releasing this as GPL, as far as I can tell they aren't licensing it separately. – foxxtrot Aug 15 '11 at 16:58
  • I don't understand either! If somebody wanted to create a 'MySpace killer' they are likely to be satisfied by using it in-house (without redistribution of either binary or source). I guess using AGPL (Affero GPL) would be more appropriate to fix a loophole with public web services using it and not distributing any source... Sad and strange :/ – IgorK Aug 15 '11 at 17:21
  • If the interface library is LGPL, or less restrictive, you could have the database separate from the main application... which could work for a closed-source application... Could also get some brownie points. I think that in the .Net space either using MongoDB, which is well supported or RavenDB which is .Net native and dual-licensed would work better. – Tracker1 Oct 13 '11 at 17:18
13

See http://research.microsoft.com/en-us/projects/dryadlinq/default.aspx or http://msdn.microsoft.com/en-us/library/dd179423.aspx

  • 7
    Microsoft cancelled Dryad and decided to stick with Hadoop – Arnon Rotem-Gal-Oz Dec 01 '11 at 05:21
  • @ArnonRotem-Gal-Oz: do you have a reference of that statement from Microsoft? – Abel Feb 07 '12 at 23:20
  • 4
    see http://blogs.technet.com/b/windowshpc/archive/2011/11/11/hpc-pack-2008-r2-sp3-and-windows-azure-hpc-scheduler-released.aspx - "As part of this release we’ve also updated the preview version of LINQ to HPC, however, this will be the final preview and we do not plan to move forward with a production release. In line with our announcement in October at the PASS conference we will focus our effort on bringing Apache Hadoop to both Windows Server and Windows Azure ..." – Arnon Rotem-Gal-Oz Feb 07 '12 at 23:54
10

I answered your question in my question here

To say it here in the source:

Microsoft dropped its alternative (Dryad) in favor of Hadoop. Next year they will release MS SQL Server 2012 with Hadoop integration. Azure and Windows Sever support is being developed even as we speak.

It will be available in the first half in 2012.

Hadoop is the #1 BigData platform and is going to be supported by opensource and proprietary source (Java, .Net, Python, ...) even Oracle is adopting it.

If you were developing something, you should wait if you're on the .Net platform.

More information about what is possible will be available here

Community
  • 1
  • 1
NicoJuicy
  • 3,435
  • 4
  • 40
  • 66
5

I would say that DryadLinq is the closest thing that us .NET folk have to Hadoop. But it depends what you want to use hadoop for. If you are looking for the optimized self maintaining distributed file (DFS) system then DryadLINQ isn't what you are looking for. It has an analog to the DFS but you have to manually build the partitions and distribute each partition.

That being said, if its the distributed execution aspect of Hadoop that you are looking for than DryadLINQ is truly wonderful (and no, i'm not affiliated with MS). As long as you have a Microsoft HPC cluster setup than getting going with DryadLINQ is really easy.

The code you write is really just straight LINQ code, except instead of executing the LINQ on IEnumerable<T> you have to execute it on PartitionedTable<T> (the self build distributed data structure).

What has really been cool about DryadLINQ is the fast turn around time (try, test, adjust, repeat) when developing algorithms. You just write LINQ code to do your calculations and DryadLINQ will take care of the whole distributed execution part. It's the most natural analog I've come across that makes writing code for distributed processing just like writing code for single process processing.

casperOne
  • 73,706
  • 19
  • 184
  • 253
Turbo
  • 2,490
  • 4
  • 25
  • 30
4

You can look into something like RavenDb it provides very decent support for MapReduce for a fairly large size of data. as it is built in .Net so a proper LINQ client API is available.

http://ravendb.net/

To get you started you can read my blog entery.

Ovais
  • 276
  • 1
  • 16
2

Microsoft is in the process of rolling out HDInsight, which is billed as their "100% Apache compatible Hadoop distribution."

It is available both on Windows Server and as a Windows Azure service.

Buggieboy
  • 4,636
  • 4
  • 55
  • 79
  • 1
    HDInsight is the Hortonworks distribution. Other major vendors are also working with Microsoft to offer their distributions on Azure. Pertaining to the question: there are .NET interfaces to HDInsight, but HDInsight itself is not .NET – ashtonium Jul 22 '15 at 16:32
2

It may be better to use Apache Hadoop and streaming because Apache Hadoop is actively being developed and maintained by big giants in the Industry like Yahoo and Facebook. So it can do what you expect it to do.

If you need a solution in .NET please check Myspace implementation @ MySpace Qizmt - MySpace’s Open Source Mapreduce Framework

Community
  • 1
  • 1
Dileep stanley
  • 148
  • 1
  • 6
1

You can now use Hadoop directly from .NET Microsoft has release a SDK to do so.

https://hadoopsdk.codeplex.com/

Of course this means using the java based Hadoop network. But does it matter if the server is running in java? I am sure someone may attempt to port it but I don't think it would be a good idea as corporations are already backing the java version and I don't think the .NET port will get the same attention.

Dreamwalker
  • 3,032
  • 4
  • 30
  • 60
1

Have a look on:

http://www.windowsazure.com/en-us/services/hdinsight/

It is an implementation of Hadoop for Azure and you can use .NET for accessing it.

Stefan Papp
  • 2,199
  • 1
  • 28
  • 54
1

Internally, Microsoft have been using Cosmos. This has been made available outside Microsoft thru Azure. It's named Azure Data Lake Analytics and Azure Data Lake Store. Azure Data Lake analytics is kind of Yarn as a service and Azure Data Lake Store WebHDFS as a service. The first version of Azure Data Lake Analytics only hosts U-SQL a language based on Transact-SQL + C#.

benjguin
  • 1,496
  • 1
  • 12
  • 21
1

Microsoft Research has project Daytona http://research.microsoft.com/en-us/projects/daytona/

You can download it. There's a WordCount sample in C#.

benjguin
  • 1,496
  • 1
  • 12
  • 21
0

There's a pretty cute MapReduce implementation for .NET at: http://mapsharp.codeplex.com/

0

As others have mentioned, DryadLINQ is a programming framework that allows developers to write LINQ queries and execute them on a cluster, in a similar manner to MapReduce. The DryadLINQ project has recently been released under the Apache license on GitHub, and the release includes support for running on YARN clusters (including Azure HDInsight clusters).

mrry
  • 125,488
  • 26
  • 399
  • 400
0

dryad/linq is being productized and will be released soon: http://blogs.technet.com/b/windowshpc/archive/2011/07/07/announcing-linq-to-hpc-beta-2.aspx use in conjunction with Microsoft HPC for a powerful, cluster based solution for quering unstructured data

John
  • 9
  • 1