2

I am new to numpy and python so please be gentle.

So I am working on a csv file popularnames.csv and it has different columns, I only want to load column number 3 which is titled 'Popular names in India' and find the names in that column that have been repeated more than 10 times. I also only want to use numpy for the purpose and cant find any solution yet.

My code is:

Baby_names=np.genfromtxt('popularnames.csv', delimiter=',', usecols=(3), skip_header=1, dtype=str)
for Baby_names:
    if np.unique(Baby_names)>10:
        print(Baby_names)

I do understand that this code is wrong but that is all I could think of with the limited knowledge i have. Any help would be appreciated.

Thanks in advance!

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • Hi, welcome to StackOverflow. Could you describe what your code does (e.g. does it give you an error, or print out something you're not expecting)? This will help others to find an answer. – Zoey Hewll Apr 23 '20 at 03:13

2 Answers2

0

The syntax for the for loop is wrong.

Try the following code:

baby_names = np.genfromtxt('popularnames.csv', delimiter=',', usecols=(3), skip_header=1, dtype=str)

for name, count in zip(*np.unique(baby_names, return_count=True)):
    if count > 10:
        print(name)

return_count=True tells numpy to return the count for each unique name. zip binds the names to the counts which allows us to then iterate over the two.

If you're new to Python, I suggest you continue learning it before using numpy.

Pharoah Jardin
  • 134
  • 2
  • 9
0

I have created a dummy example for you:

from io import StringIO
test = "Baby_names,age,country\nsarah,4,USA\njames,1,UK\nsarah,2,'UK'\n'sarah,3,France\n'john,2,UK\njames,6,Australia"
a = np.genfromtxt(StringIO(test), delimiter=',',usecols=(0), skip_header=1, dtype=str)
print(a)

['sarah' 'james' 'sarah' "'sarah" "'john" 'james']

unique, counts = np.unique(a, return_counts=True)
x = dict(zip(unique, counts))

x:

{"'john": 1, "'sarah": 1, 'james': 2, 'sarah': 2}

print([key for key, value in x.items() if value >= 2])

['james', 'sarah']

Shorten code:

for (name, count) in zip(*np.unique(a, return_counts=True)):
    if count >1:
        print(name)
Pygirl
  • 12,969
  • 5
  • 30
  • 43
  • hi, thanks alot for the answer! just one more question, would the syntax be similar if i were to find the highest used name? – karan sethi Apr 23 '20 at 03:16
  • 1
    If highest and a single name then you can use `Counter.most_common`. https://stackoverflow.com/a/6252400/6660373. Else you can find on your own. since we have `x` dict element containing counts we can find out the maximum value ones element – Pygirl Apr 23 '20 at 03:22
  • 1
    https://stackoverflow.com/questions/60828477/printing-name-of-second-lowest-mark-scorer-in-a-nested-list-and-arranging-in-alp/60828814#60828814 You can get the first maximum also in case there are many. – Pygirl Apr 23 '20 at 03:25