How to normalize input in Mongo - Ruby

Question

I am dealing with some very large third party CSV files which I import in Mongo. The issue is that the headers on these files are very inconsistent. Some files contain all lowercase, some mixed case and some all uppercase headers. After the file is imported, is there a way to query a collection case insensitively?

In short can I do:

 @collection.find('AB' => value) so that it's true for fields 'ab' or aB' as well?

Is there a way to normalize these headers at the time of import? or may be there's another way?

The question suggested as a duplicate is for regex searches on values and not the field itself

@mudasobwa the question you've linked is for searches on values not fields — Rahul, Jul 06 '17 at 16:47
What do you mean “values not fields”? While schemaless, `mongo` has case sensitive keys and `{"a" => 42, "A" => 3.14}` is a perfectly valid record having two different fields. I will unmark the question as duplicate, but you should consider downcasing your keys on the insertion stage. — Aleksei Matiushkin, Jul 06 '17 at 16:55
Normalize on insertion as suggested by @mudasobwa. Your current question is equivalent to "I would like to find a value by its column but i don't know the name of the column or the value of the value" — engineersmnky, Jul 06 '17 at 17:10
@mudasobwa is there a way to downcase keys on insertion during import? — Rahul, Jul 07 '17 at 08:53
@engineersmnky I understand what you are saying but I do know the keys, its just that I don't know what case they are in due to the poor quality external data I am dealing with. I am looking for a way to either upcase or down case these on import — Rahul, Jul 07 '17 at 08:57
@mudasobwa I use mongoimport task. `mongoimport -d db -c collection --type csv --headerline --file file.csv` The csv file is what has headers that are mixed case — Rahul, Jul 07 '17 at 11:50
Do `ruby -e "File.write('/path/to/file', File.readlines('/path/to/file').tap { |l| l[0].downcase! })"` upfront to fix the titles to be downcased. — Aleksei Matiushkin, Jul 07 '17 at 12:24
@mudasobwa beat me to it. OP as stated above just preprocess the file to ensure the headers are downcased first — engineersmnky, Jul 07 '17 at 12:33
@mudasobwa thanks for this. Looks like something that could work for me. I will give it a shot. The only thing is the files I have are rather large. Will this not read the entire file just to down case the first line? — Rahul, Jul 07 '17 at 22:16
Yes it will. There are many possibilities to not load the entire file into memory, which this margin is too small to contain. You might ask the different question, but I am positive you’ll discover the solution yourself. — Aleksei Matiushkin, Jul 08 '17 at 03:58
Thanks @mudasobwa. I have seen that there is a readline method that only reads the first line of the file. If you post your solution as an answer I will be happy to accept — Rahul, Jul 08 '17 at 09:48

score -2 · Answer 1 · answered Jul 06 '17 at 17:26

-2

You should use the built-in Ruby methods for the String class documentation. For example, .strip!, would be a good one to use to remove excess whitespace. .downcase! could be used to normalize the case. There are others you may find useful to use in the documentation. This is one of the best resources for beginner to intermediate level Ruby programming. Once you master the methods of each class, you can leverage the full power of Ruby.

answered Jul 06 '17 at 17:26

broken_historian

37
4

1

This does by no mean provide an answer to the question stated. – Aleksei Matiushkin Jul 07 '17 at 09:17
Yes it does, I explained how to downcase strings, which could be used to solve the problem stated. But maybe next time instead of being so quick to dismiss, you should read slower. – broken_historian Jul 07 '17 at 18:28
"could be used to solve the problem" is not an answer. An answer **should** solve the problem – engineersmnky Jul 07 '17 at 21:47

How to normalize input in Mongo - Ruby

1 Answers1

Linked