0

I have a file that has several lines of animal names followed by numbers like this:

African elephant         6654.000 5712.000  -999.0  -999.0     3.3    38.6   645.0       3       5       3
African giant pouched rat   1.000    6.600     6.3     2.0     8.3     4.5    42.0       3       1       3
Arctic Fox                  3.385   44.500  -999.0  -999.0    12.5    14.0    60.0       1       1       1
Arctic ground squirrel       .920    5.700  -999.0  -999.0    16.5  -999.0    25.0       5       2       3
Asian elephant           2547.000 4603.000     2.1     1.8     3.9    69.0   624.0       3       5       4
Baboon                     10.550  179.500     9.1      .7     9.8    27.0   180.0       4       4       4
.
.
.

I have of list of lists of the data that looks like:

[['African', 'elephant', '6654.000', '5712.000', '-999.0', '-999.0', '3.3', '38.6', '645.0', '3', '5', '3'], 
['African', 'giant', 'pouched', 'rat', '1.000', '6.600', '6.3', '2.0', '8.3', '4.5', '42.0', '3', '1', '3'], 
['Arctic', 'Fox', '3.385', '44.500', '-999.0', '-999.0', '12.5', '14.0', '60.0', '1', '1', '1'], 
['Arctic', 'ground', 'squirrel', '.920', '5.700', '-999.0', '-999.0', '16.5', '-999.0', '25.0', '5', '2', '3'], ... ]

but I need each animal name to be in their own element like:

[['African elephant', '6654.000', '5712.000', '-999.0', '-999.0', '3.3', '38.6', '645.0', '3', '5', '3'], 
 ['African giant pouched rat', '1.000', '6.600', '6.3', '2.0', '8.3', '4.5', '42.0', '3', '1', '3'], 
 ['Arctic Fox', '3.385', '44.500', '-999.0', '-999.0', '12.5', '14.0', '60.0', '1', '1', '1'], 
 ['Arctic ground squirrel', '.920', '5.700', '-999.0', '-999.0', '16.5', '-999.0', '25.0', '5', '2', '3']...]

Is there a way to loop through the list and combine each string of the animal name into one element?

I'm a student in my first semester of Python so I apologize if the answer is obvious.

npeters
  • 33
  • 2
  • Do you have any control over the format of the file? It is much easier to change the file to a proper csv format than coming up with an ad-hoc solution when parsing it. – DeepSpace May 04 '18 at 17:47
  • Yes I have control over the format. I just wanted to see if there was a way I could do it without having to change the file. – npeters May 04 '18 at 17:51
  • just change the file to a proper csv format.. then you will be able to correctly parse the file in about one line of code without re-inventing the wheel – DeepSpace May 04 '18 at 17:54
  • I also would recommend using a different file format. But if you want to use this one, you could do something like try to convert every value to float. If they can't be converted, they are non-numeric and must be a word. You can use that to determine that they are in the name. – Sandeep Dcunha May 04 '18 at 18:03

4 Answers4

2

Since you commented that you have control over the format of the file, changing it to proper CSV format (with or without headers) will be much easier than coming up with a custom ad-hoc solution.

African elephant,6654.000,5712.000,-999.0,-999.0,3.3,38.6,645.0,3,5,3
African giant pouched rat,1.000,6.600,6.3,2.0,8.3,4.5,42.0,3,1,3
Arctic Fox,3.385,44.500,-999.0,-999.0,12.5,14.0,60.0,1,1,1
Arctic ground squirrel,.920,5.700,-999.0,-999.0,16.5,-999.0,25.0,5,2,3
Asian elephant,2547.000 4603.000,2.1,1.8,3.9,69.0,624.0,3,5,4
Baboon,10.550,179.500,9.1,.7,9.8,27.0,180.0,4,4,4

Then all you have to do is

import csv

with open('test_files/test.csv') as f:
    lines = list(csv.reader(f))

print(lines)

#  [['African elephant', '6654.000', '5712.000', '-999.0', '-999.0', '3.3', '38.6', '645.0', '3', '5', '3'],
#   ['African giant pouched rat', '1.000', '6.600', '6.3', '2.0', '8.3', '4.5', '42.0', '3', '1', '3'],
#   ['Arctic Fox', '3.385', '44.500', '-999.0', '-999.0', '12.5', '14.0', '60.0', '1', '1', '1'],
#   ['Arctic ground squirrel', '.920', '5.700', '-999.0', '-999.0', '16.5', '-999.0', '25.0', '5', '2', '3'],
#   ['Asian elephant', '2547.000 4603.000', '2.1', '1.8', '3.9', '69.0', '624.0', '3', '5', '4'],
#   ['Baboon', '10.550', '179.500', '9.1', '.7', '9.8', '27.0', '180.0', '4', '4', '4']]
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
0

If you don't want to change the file into csv format, you can define a function that returns True if the string is not convertible to float (meaning it's not a number):

def is_string(string):
    try:
        float(string)
        return False
    except ValueError:
        return True

Then:

# The list of lists:
lst = [['African', 'elephant', '6654.000', '5712.000', '-999.0', '-999.0', '3.3', '38.6', '645.0', '3', '5', '3'], 
['African', 'giant', 'pouched', 'rat', '1.000', '6.600', '6.3', '2.0', '8.3', '4.5', '42.0', '3', '1', '3'], 
['Arctic', 'Fox', '3.385', '44.500', '-999.0', '-999.0', '12.5', '14.0', '60.0', '1', '1', '1'], 
['Arctic', 'ground', 'squirrel', '.920', '5.700', '-999.0', '-999.0', '16.5', '-999.0', '25.0', '5', '2', '3'] ]

for animal in lst:
    animalname = ''
    for item in animal:
        if is_string(item):
            animalname += item + ' '
        else:
            break;
    print animalname.rstrip(' ')

This gives you:

African elephant
African giant pouched rat
Arctic Fox
Arctic ground squirrel
0

If you want to play with your list and some manipulations(slice, join etc..) on it you can go like:

animals = [['African', 'elephant', '6654.000', '5712.000', '-999.0', '-999.0', '3.3', '38.6', '645.0', '3', '5', '3'], 
    ['African', 'giant', 'pouched', 'rat', '1.000', '6.600', '6.3', '2.0', '8.3', '4.5', '42.0', '3', '1', '3'], 
    ['Arctic', 'Fox', '3.385', '44.500', '-999.0', '-999.0', '12.5', '14.0', '60.0', '1', '1', '1'], 
    ['Arctic', 'ground', 'squirrel', '.920', '5.700', '-999.0', '-999.0', '16.5', '-999.0', '25.0', '5', '2', '3']]

sliceObj = slice(0, 2)
delimiter = ' '
animalsNew=[]
for animal in animals:
    subanimalArray=animal[sliceObj]
    arrayEnd=animal[2:]

    animalName = delimiter.join(subanimalArray)

    arrayEnd.insert(0, animalName)
    print "animalsNew:",' ; '.join(arrayEnd)
    animalsNew.append(arrayEnd)

Use this snippet in a browser for example. It is skulpt based:

<html> 
<head> 
<script src="http://ajax.googleapis.com/ajax/libs/jquery/1.9.0/jquery.min.js" type="text/javascript"></script> 
<script src="http://www.skulpt.org/static/skulpt.min.js" type="text/javascript"></script> 
<script src="http://www.skulpt.org/static/skulpt-stdlib.js" type="text/javascript"></script> 

</head> 

<body> 

<script type="text/javascript"> 
// output functions are configurable.  This one just appends some text
// to a pre element.
function outf(text) { 
    var mypre = document.getElementById("output"); 
    mypre.innerHTML = mypre.innerHTML + text; 
} 
function builtinRead(x) {
    if (Sk.builtinFiles === undefined || Sk.builtinFiles["files"][x] === undefined)
            throw "File not found: '" + x + "'";
    return Sk.builtinFiles["files"][x];
}

// Here's everything you need to run a python program in skulpt
// grab the code from your textarea
// get a reference to your pre element for output
// configure the output function
// call Sk.importMainWithBody()
function runit() { 
   var prog = document.getElementById("yourcode").value; 
   var mypre = document.getElementById("output"); 
   mypre.innerHTML = ''; 
   Sk.pre = "output";
   Sk.configure({output:outf, read:builtinRead}); 
   (Sk.TurtleGraphics || (Sk.TurtleGraphics = {})).target = 'mycanvas';
   var myPromise = Sk.misceval.asyncToPromise(function() {
       return Sk.importMainWithBody("<stdin>", false, prog, true);
   });
   myPromise.then(function(mod) {
       console.log('success');
   },
       function(err) {
       console.log(err.toString());
   });
} 
</script> 

<h3>Try This</h3> 
<form> 
<textarea id="yourcode" cols="80" rows="20">
animals = [['African', 'elephant', '6654.000', '5712.000', '-999.0', '-999.0', '3.3', '38.6', '645.0', '3', '5', '3'], 
    ['African', 'giant', 'pouched', 'rat', '1.000', '6.600', '6.3', '2.0', '8.3', '4.5', '42.0', '3', '1', '3'], 
    ['Arctic', 'Fox', '3.385', '44.500', '-999.0', '-999.0', '12.5', '14.0', '60.0', '1', '1', '1'], 
    ['Arctic', 'ground', 'squirrel', '.920', '5.700', '-999.0', '-999.0', '16.5', '-999.0', '25.0', '5', '2', '3']]
    
sliceObj = slice(0, 2)
delimiter = ' '
animalsNew=[]
for animal in animals:
    subanimalArray=animal[sliceObj]
    arrayEnd=animal[2:]
    
    animalName = delimiter.join(subanimalArray)
    
    arrayEnd.insert(0, animalName)
    print "animalsNew:",' ; '.join(arrayEnd)
    animalsNew.append(arrayEnd)
            
</textarea><br /> 
<button type="button" onclick="runit()">Run</button> 
</form> 
<pre id="output" ></pre> 
<!-- If you want turtle graphics include a canvas -->
<div id="mycanvas"></div> 

</body> 

</html> 
Novy
  • 1,436
  • 1
  • 14
  • 26
0

If your input file is in.txt, this will work.

f = open('in.txt')

out = []
for line in f:
        l = line.split()
        wordlist = []
        numlist = []
        for word in l:
                if word.isalpha():
                        wordlist.append(word)
                else:
                        numlist.append(word)
        numlist.insert(0, ' '.join(wordlist))
        out.append(numlist)
print out
Ritesh
  • 79
  • 7