How do I manipulate a large data set into smaller sets based on the type of object within the data?

Question

In my code, the user inputs a text file. The text file contains 4 columns and the number of rows will vary with the text file that is loaded so the code must be generic. The first column of the array generated from the text file contains a type of animal, the second column is its Xlocation in a field, the third is its Ylocation in a field and the fourth is the animals Zlocation in the field. Load the data If you don't want to follow the link to the picture of the data, here is a copy of the code loading the data and the array that is returned:

#load the data
emplaced_animals_data = np.genfromtxt('animal_data.txt', skip_header = 1, dtype = str)
print(type(emplaced_animals_data))
print(emplaced_animals_data)

[['butterfly' '1' '1' '3']
 ['butterfly' '2' '2' '3']
 ['butterfly' '3' '3' '3']
 ['dragonfly' '4' '1' '1']
 ['dragonfly' '5' '2' '1']
 ['dragonfly' '6' '3' '1']
 ['cat' '4' '4' '2']
 ['cat' '5' '5' '2']
 ['cat' '6' '6' '2']
 ['cat' '7' '8' '3']
 ['elephant' '8' '9' '3']
 ['elephant' '9' '10' '4']
 ['elephant' '10' '10' '4']
 ['camel' '10' '11' '5']
 ['camel' '11' '6' '5']
 ['camel' '12' '5' '6']
 ['camel' '12' '3' '6']
 ['bear' '13' '13' '7']
 ['bear' '5' '15' '7']
 ['bear' '4' '10' '5']
 ['bear' '6' '9' '2']
 ['bear' '15' '13' '1']
 ['dog' '1' '3' '9']
 ['dog' '2' '12' '8']
 ['dog' '3' '10' '1']
 ['dog' '4' '8' '1']]

After the data is loaded in, there will always be two types of animals in the data that we don't want to know anything about so I remove the names of these animals from the first column, but I am unsure how to remove the data from the whole row. How would I extend the selection of data from the type of animal to its location and delete it for the unwanted animals? I have included images to show the outputs of what I have currently done. Remove Unwanted Animals

#Removes unwanted animals from list
print('Original list:', emplaced_animals_data[:,0])
all_the_animals = list(emplaced_animals_data[:,0])
Butterfly = set('butterfly')
Dragonfly = set('dragonfly')

for i in range(0, len(emplaced_animals_data)):
    for animal in all_the_animals:
        if Butterfly == set(animal):
            all_the_animals.remove(animal)
        if Dragonfly == set(animal):
            all_the_animals.remove(animal)
print('Updated list:', words)

Next, I would like to take the remaining animals and sort each animal along with its location data into its own array which would be saved as some variable, but currently I am only able to sort the animal types into their own arrays. How would I extend my selection of the animals to incorporate their locations as well as save the animals and their locations to their own array based on type of animal?Grouping Animals

#Groups all of the items with the same name together
setofanimals = set(all_the_animals)

animal_groups = {}

for one in setofanimals:
    ids = [one for i in emplaced_animals_data[:,0] if i == one]
    animal_groups.update({one:ids})

for one in animal_groups:
    print(one, ":", animal_groups[one])

My end goal is to be able to plot each occurrence of each type of animal regardless of the text file that is loaded in.

Here is the data I am working with, copied from the Excel Spreadsheet that I have saved as a text file:

Data

hi @td_python. Can you edit your question so that instead of the line `emplaced_animals_data = np.genfromtxt('animal_data.txt', skip_header = 1, dtype = str)`, you create an array with the `np.array` constructor so that anybody who helping you out with your question can see what your data looks like immediately and easily load it into their terminal? I don't want to click through to a link with a picture of the data. If you edit your question I'll help you out. — Goodword, Nov 01 '19 at 17:14
Hello @Goodword, I didn't really understand how to create an array the way you are asking but I added a copy of my data above so that hopefully it will be easier to copy and paste. Let me know if this works for you! Thank you. — td_python, Nov 01 '19 at 19:26
@td_python Could you please add the exact input data that you have in your text file? just post the textfile (or at least a part of it) so that we can figure out the best way to load and proceed the data — Chikko, Nov 04 '19 at 07:55
@Chikko I have attached an image at the very end called "data." It's a picture of the Excel file that I am saving as text file (tab delimited) and then loading into my code. I hope this is helpful. I apologize that I am not the best at using Stackoverflow (I'm very new to it) so uploading this data in the right format has been challenging. — td_python, Nov 04 '19 at 15:52
@Chikko So after spending the day playing with my code, I have made a lot of advances and now I just need help plotting... Sorry for all the questions I have made but here's a link that now describes my new question: https://stackoverflow.com/questions/58844718/how-do-i-create-a-scatter-plot-using-data-from-two-dictionaries. Thanks for sticking with me! — td_python, Nov 13 '19 at 20:02

jacob · Answer 1 · 2019-11-04T22:30:25.003

1

The following functions should accomplish this. Your input txt can be arbitrary in length, and both functions take in a list of animals to delete or select based on the animals contained in said list:

import numpy as np

# note that my delimiter is a tab, which might be different from yours
emplaced_animals = np.genfromtxt('animals.txt', skip_header=1, dtype=str, delimiter='   ')
listed_animals = ['cat', 'dog', 'bear', 'camel', 'elephant']

def get_specific_animals_from(list_of_all_animals, specific_animals):
    """get a list only containing rows of a specific animal"""
    list_of_specific_animals = np.array([])
    for specific_animal in specific_animals:
        for animal in list_of_all_animals:
            if animal[0] == specific_animal:
                list_of_specific_animals = np.append(list_of_specific_animals, animal, 0)
    return list_of_specific_animals

def delete_specific_animals_from(list_of_all_animals, bad_animals):
    """
    delete all rows of bad_animal in provided list
    takes in a list of bad animals e.g. ['dragonfly', 'butterfly']
    returns list of only desired animals
    """
    all_useful_animals = list_of_all_animals
    positions_of_bad_animals = []
    for n, animal in enumerate(list_of_all_animals):
        if animal[0] in bad_animals:
            positions_of_bad_animals.append(n)
    if len(positions_of_bad_animals):
        for position in sorted(positions_of_bad_animals, reverse=True):
            # reverse is important
            # without it, list positions change as you delete items
            all_useful_animals = np.delete(all_useful_animals, (position), 0)
    return all_useful_animals

emplaced_animals = delete_specific_animals_from(emplaced_animals, ['dragonfly', 'butterfly'])

list_of_elephants = get_specific_animals_from(emplaced_animals, ['elephant'])

list_of_needed_animals = get_specific_animals_from(emplaced_animals, listed_animals)

edited Nov 04 '19 at 22:30

answered Nov 01 '19 at 17:33

jacob

828
8
13

Thank you! I uploaded my data into a more accessible form above. Currently when I run what you have written it is telling me that it cannot delete array elements in this line "del all_useful_animals[position]." Additionally, if I'm not sure what type of animals may be in my array, how might I assign them to a variable within a loop instead? I was thinking of doing something along the lines of having an A variable and for each group of animals it would count so A1 would be the elephant data and A2 would be the bear data and so on. – td_python Nov 01 '19 at 19:36
Yes, after seeing your data, my if statements will never be true, as it looks like you do not have a list of lists but rather a list of single element arrays containing strings. Can you clarify, should the first line of your emplaced_animals array read ['butterfly' '2' '2' '3'] or ['butterfly', '2', '2', '3']? Note the commas (very important distinction). – jacob Nov 01 '19 at 19:53
Hi Jacob, when I print my array out it does not have any commas in it. – td_python Nov 01 '19 at 19:58
Yes, but does the text file? – jacob Nov 01 '19 at 19:59
Nope! The text file does not have any commas. Sorry for the confusion there. – td_python Nov 01 '19 at 20:04
I just added this bit of code after taking out the unwanted animals and it almost does what I need it to in that step but it has repeats of each animal with the same data(hopefully this is easy enough to copy): lots_of_data = [] for i in range(0,len(all_the_animals)): for j in range(0,len(emplaced_animals_data)): if all_the_animals[i] == emplaced_animals_data[j,0]: lots_of_data = np.append(lots_of_data, emplaced_animals_data[j]) print(lots_of_data) – td_python Nov 01 '19 at 20:21
The issue is that without the commas, each entry is just a string. In other words, python treats ['butterfly' '2' '2' '3'] the same as ['butterfly223'], in which case emplaced_animals[0][0] returns 'b'. This will be very hard to work with later when you want to differentiate between something like 'butterfly223' and 'butterfly1223'. Does the text file have any delimiter between values? If so, use the delimiter argument, for example: `np.genfromtxt('animal_data.txt', skip_header = 1, dtype = str, delimiter=' ')` for a whitespace delimiter and my answer should work. – jacob Nov 01 '19 at 21:23
I tried your code again with the delimiter added in and its still not working (the elephant list is empty when printed). Additionally, I'm not sure if this completely answers my question because I need to be able to access each animal list without specifically calling it. I'm not sure if that makes sense.. I think I might have done a better job explaining here after I did some more work on what establishing what I really needed: https://stackoverflow.com/questions/58700735/get-location-data-from-list-comparison-and-plot – td_python Nov 04 '19 at 21:10
I've update my answer accordingly, and I tested it on data like yours. – jacob Nov 04 '19 at 22:30

score 0 · Answer 2 · answered Nov 05 '19 at 12:11

i dont know if this is exactly what you want but take a look at it. First of all regarding to your comment maybe you have to change the delimiter to ',' oder ';'. The Code is tested and works fine with a comma seperated text file

Input (.txt):

Animals,Xlocation,Ylocation,Zlocation
butterfly,1,1,3
butterfly,2,2,3
butterfly,3,3,3
dragonfly,4,1,1
dragonfly,5,2,1
dragonfly,6,3,1
cat,4,4,2
cat,5,5,2
cat,6,6,2
cat,7,8,3
elephant,8,9,3
elephant,9,10,4
elephant,10,10,4
camel,10,11,5
camel,11,6,5
camel,12,5,6
camel,12,3,6
bear,13,13,7
bear,5,15,7
bear,4,10,5
bear,6,9,2
bear,15,13,1
dog,1,3,9
dog,2,12,8
dog,3,10,1
dog,4,8,1

Code:

def main():
    result = readFile("C:\\Users\\Desktop\\animals.txt")
    # Array of animals to remove from main list
    to_remove = ["butterfly", "dragonfly"]

    # returns a new list with all rows except the 'to_remove animals'
    useful_animals = [one for one in result if one["Animals"] not in to_remove]

    cats = get_animal_group(useful_animals, "cat")
    camels = get_animal_group(useful_animals, "camel")

# returns a new list with all rows where animals_list match given animal
def get_animal_group(animal_list, animal):
    return [one for one in animal_list if one["Animals"] == animal]

def readFile(path):
    # From this you get a list of dict which is much easier to handle
    result = pandas.read_csv(path, encoding="utf-8",
                             usecols=["Animals", "Xlocation", "Ylocation", "Zlocation"]).to_dict("records")
    return result

Output:

# for animal in useful_animals:
{'Animals': 'cat', 'Xlocation': 4, 'Ylocation': 4, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 5, 'Ylocation': 5, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 6, 'Ylocation': 6, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 7, 'Ylocation': 8, 'Zlocation': 3.0}
{'Animals': 'elephant', 'Xlocation': 8, 'Ylocation': 9, 'Zlocation': 3.0}
{'Animals': 'elephant', 'Xlocation': 9, 'Ylocation': 10, 'Zlocation': 4.0}
{'Animals': 'elephant', 'Xlocation': 10, 'Ylocation': 10, 'Zlocation': 4.0}
{'Animals': 'camel', 'Xlocation': 10, 'Ylocation': 11, 'Zlocation': 5.0}
{'Animals': 'camel', 'Xlocation': 11, 'Ylocation': 6, 'Zlocation': 5.0}
{'Animals': 'camel', 'Xlocation': 12, 'Ylocation': 5, 'Zlocation': 6.0}
{'Animals': 'camel', 'Xlocation': 12, 'Ylocation': 3, 'Zlocation': 6.0}
{'Animals': 'bear', 'Xlocation': 13, 'Ylocation': 13, 'Zlocation': 7.0}
{'Animals': 'bear', 'Xlocation': 5, 'Ylocation': 15, 'Zlocation': 7.0}
{'Animals': 'bear', 'Xlocation': 4, 'Ylocation': 10, 'Zlocation': 5.0}
{'Animals': 'bear', 'Xlocation': 6, 'Ylocation': 9, 'Zlocation': 2.0}
{'Animals': 'bear', 'Xlocation': 15, 'Ylocation': 13, 'Zlocation': 1.0}
{'Animals': 'dog', 'Xlocation': 1, 'Ylocation': 3, 'Zlocation': 9.0}
{'Animals': 'dog', 'Xlocation': 2, 'Ylocation': 12, 'Zlocation': 8.0}
{'Animals': 'dog', 'Xlocation': 3, 'Ylocation': 10, 'Zlocation': 1.0}
{'Animals': 'dog', 'Xlocation': 4, 'Ylocation': 8, 'Zlocation': 1.0}

# for cat in cats:
{'Animals': 'cat', 'Xlocation': 4, 'Ylocation': 4, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 5, 'Ylocation': 5, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 6, 'Ylocation': 6, 'Zlocation': 2.0}
{'Animals': 'cat', 'Xlocation': 7, 'Ylocation': 8, 'Zlocation': 3.0}

if you have further questions feel free to ask (comment)

Greetings

I need to do something different with the "get_animal_group" function. When I import the text file I am never sure of what or how many different animals are in it so I can't make a variable explicitly called cats to hold all of the data about cats. I need to create variables like Animal1, Animal2, etc. that will save all the locations of the animal so that I can plot each animal as a different symbol and then the string of the animal name from actual data like "cat" so that I can create a legend. My end goal is to have a plot of all the different animals. Any ideas? @Chikko — td_python, Nov 13 '19 at 17:02

How do I manipulate a large data set into smaller sets based on the type of object within the data?

2 Answers2

Linked