2

(See edit at the bottom of this post)

I'm making a program in Elixir that counts the types of HTML tags from a list of tags that I've already obtained. This means that the key should be the tag and the value should be the count.

e.g. in the following sample file

<html><head><body><sometag><sometag><sometag2><sometag>

My output should be something like the following:

html: 1
head: 1
body: 1
sometag: 3
sometag2: 1

Here is my code:

def tags(page) do
    taglist = Regex.scan(~r/<[a-zA-Z0-9]+/, page)

    dict = Map.new()

    Enum.map(taglist, fn(x) -> 
                        tag = String.to_atom(hd(x))
                        Map.put_new(dict, tag, 1)
                      end)

end

I know I should be probably using Enum.each instead but when I do that my dictionary ends up just being empty instead of incorrect.

With Enum.map, this is the output I receive:

iex(15)> A3.test
[%{"<html" => 1}, %{"<body" => 1}, %{"<p" => 1}, %{"<a" => 1}, %{"<p" => 1},
 %{"<a" => 1}, %{"<p" => 1}, %{"<a" => 1}, %{"<p" => 1}, %{"<a" => 1}]

As you can see, there are duplicate entries and it's turned into a list of dictionaries. For now I'm not even trying to get the count working, so long as the dictionary doesn't duplicate entries (which is why the value is always just "1").

Thanks for any help.

EDIT: ------------------

Okay so I figured out that I need to use Enum.reduce

The following code produces the output I'm looking for (for now):

def tags(page) do
    rawTagList = Regex.scan(~r/<[a-zA-Z0-9]+/, page)
    tagList = Enum.map(rawTagList, fn(tag) -> String.to_atom(hd(tag)) end)


    Enum.reduce(tagList, %{}, fn(tag, acc) -> 
                                    Map.put_new(acc, tag, 1)
                                end)

end

Output:

%{"<a": 1, "<body": 1, "<html": 1, "<p": 1}

Now I have to complete the challenge of actually counting the tags as I go...If anyone can offer any insight on that I'd be grateful!

Dan
  • 1,163
  • 3
  • 14
  • 28
  • Please paste the snippet from imgur inside the question. The snippet is short and SO policy says short snippets should be included inside question. This makes searching easier and snippet will be available even in case of imgur being down. – tkowal Apr 03 '16 at 22:25
  • Done, any insight on the question? – Dan Apr 03 '16 at 22:40

1 Answers1

9

First of all, it is not the best idea to parse html with regexes. See this question for more details (especially the accepted answer).

Secondly, you are trying to write imperative code in functional language (this is about first version of your code). Variables in Elixir are immutable. dict will always be an empty map. Enum.map takes a list and always returns new list of the same length with all elements transformed. Your transformation function takes an empty map and puts one key-value pair into it.

As a result you get a list with one element maps. The line:

Map.put_new(dict, tag, 1)

doesn't update dict in place, but creates new one using old one, which is empty. In your example it is exactly the same as:

%{tag => 1}

You have couple of options to do it differently. Closest approach would be to use Enum.reduce. It takes a list, an initial accumulator and a function elem, acc -> new_acc.

taglist
|> Enum.reduce(%{}, fn(tag, acc) -> Map.update(acc, tag, 1, &(&1 + 1)) end)

It looks a little bit complicated, because there are couple of nice syntactic sugars. taglist |> Enum.reduce(%{}, fun) is the same as Enum.reduce(taglist, %{}, fun). &(&1 + 1) is shorthand for fn(counter) -> counter + 1 end.

Map.update takes four arguments: a map to update, key to update, initial value if key doesn't exist and a function that does something with the key if it exists.

So, those two lines of code do this:

  • iterate over list Enum.reduce
  • starting with empty map %{}
  • take current element and map fn(tag, acc) and either:
    • if key doesn't exist insert 1
    • if it exists increment it by one &(&1 + 1)
Community
  • 1
  • 1
tkowal
  • 9,129
  • 1
  • 27
  • 51
  • 1
    This is an absolutely outstanding answer. I've just started learning Elixir as my first functional programming language today, so your post offers a lot of new insight. Thanks a ton for your time :) Wish I could upvote more than once. – Dan Apr 03 '16 at 23:16