Python numpy ndarray skipping lines from text

Question

Based on this answer, I am using the changethis method

import numpy as np
import os

def changethis(pos):
    appex = sfile[pos[1]-1][:pos[2]] + '*' + file[pos[1]-1][pos[2]+len(pos[0]):]
    file[pos[1]-1] = appex

pos = ('stack', 3, 16)
sfile = np.genfromtxt('in.cpp',dtype='str',delimiter=os.linesep)
changethis(pos)
print(file)

where the in.cpp is a source file which contains the following:

/* Multi-line 
comment
*/

#include <iostream>
#include <fstream>

using namespace std;

int main (int argc, char *argv[]) {
  int linecount = 0;
  double array[1000], sum=0, median=0, add=0;
  string filename;
  if (argc <= 1)
      {
          cout << "Error" << endl;
          return 0;
      }

I get the output:

['using namespace std;' 'int main (int argc, char *argv[]) {'
 'int linecount = *' 'double array[1000], sum=0, median=0, add=0;'
 'string filename;' 'if (argc <= 1)' '{' 'cout << "Error" << endl;'
 'return 0;' '}']

Notice that the lines of the multi-line comment, the include statements and the empty-lines are missing from the ndarray.

I do not understand why this happens since the delimiter is set to account for each change-of-line character.

Any suggestions on how the output to be:

['/* Multi-line' 'comment' '*/' '' '#include <iostream>',
 '' '#include <fstream>' '' 'using namespace std;'
 '' 'int main (int argc, char *argv[]) {'
 'int linecount = *' 'double array[1000], sum=0, median=0, add=0;'
 'string filename;' 'if (argc <= 1)' '{' 'cout << "Error" << endl;'
 'return 0;' '}']

unable to reproduce your error. I get all the non-empty lines as output — M.T, Apr 28 '16 at 20:17
That's a pretty weird answer you got earlier. `np.genfromtxt` is a bizarre, unsuitable tool to use for this. — user2357112, Apr 28 '16 at 20:18
@M.T I may have messed a few commas on my 'ideal' output I display at the end of my question - will edit shortly. But, the output with the missing content is exactly as I have described in the description. — , Apr 28 '16 at 20:20
@user2357112 That may be the case, but to be fair, it worked like I wanted; that is, if I can make it behave as this question dictates.. If you can provide with another approach more suitable that would be great! — , Apr 28 '16 at 20:22
@M.T Yes, you presume correctly, I want every single line basically, including the multi-line comments, the empty lines and include statements which are missing in the output. — , Apr 28 '16 at 20:24

armatita · Accepted Answer · 2016-04-29T10:07:53.003

Again sorry for the use of genfromtxt, didn't understood your intentions, just tried to provide a possible solution for the problem. As a follow up for that particular solution (others have been provided) you can just do:

import numpy as np
import os

def changethis(pos):
    # Notice file is in global scope
    appex = file[pos[1]-1][:pos[2]] + '*' + file[pos[1]-1][pos[2]+len(pos[0]):]
    file[pos[1]-1] = appex

pos = ('stack', 3, 16)
file = np.array([i for i in open('in.txt','r')]) # instead of genfromtext.
changethis(pos)
print(file)

, which resulted in:

['/* Multi-line \n' 'comment\n' '*/\n*' '\n' '#include <iostream>\n'
 '#include <fstream>\n' '\n' 'using namespace std;\n' '\n'
 'int main (int argc, char *argv[]) {\n' '  int linecount = 0;\n'
 '  double array[1000], sum=0, median=0, add=0;\n' '  string filename;\n'
 '  if (argc <= 1)\n' '      {\n' '          cout << "Error" << endl;\n'
 '          return 0;\n' '      }']

EDIT: Also another relevant point mentioned by another user is the scope I was using for file. I did not mean to tell you to do stuff in global scope, I meant to explain that the function was working because file was in global scope. In any case you can create a function to hold the scope:

import numpy as np
import os

def changeallthese(poslist,path):
    def changethis(pos):
        appex = file[pos[1]-1][:pos[2]-1] + '*' + file[pos[1]-1][pos[2]-1+len(pos[0]):]
        file[pos[1]-1] = appex
    file = np.array([str(i) for i in open(path,'r')])
    for i in poslist:
        changethis(i)
    return file

poslist = [('stack', 3, 16),('stack', 18, 1),('/* Multi-line', 1, 1)]
file =   changeallthese(poslist,'in.txt')
print(file)

, which results in:

['* \n' 'comment\n' '*/\n*' '\n' '#include <iostream>\n'
 '#include <fstream>\n' '\n' 'using namespace std;\n' '\n'
 'int main (int argc, char *argv[]) {\n' '  int linecount = 0;\n'
 '  double array[1000], sum=0, median=0, add=0;\n' '  string filename;\n'
 '  if (argc <= 1)\n' '      {\n' '          cout << "Error" << endl;\n'
 '          return 0;\n' '* }']

To write an array to file you can either use the normal file writing system in Python:

fid = open('out.txt','w')
fid.writelines(file)
fid.close()

, or use a function from numpy (but I'm not sure if it will add more endlines or not so be careful):

np.savetxt('out.txt',file,fmt='%s')

That is great! If I am not mistaken, it's efficient as well. Also, is there a way of after the replacements to write each line of the list into the same text i.e. overwrite the old `in.txt` with the changed-strings version? — , Apr 29 '16 at 09:56
@hask.duk Well..., efficiency depends on criteria. This code is simple enough to make changes to it if needed and relies little on Python native code. I've edited the post with solutions for saving (overwriting the file). Be careful with `savetxt` thought for the reasons I've mentioned. — armatita, Apr 29 '16 at 10:10
And in the interest to gain solid understanding, was there an underlying reason why you chose to use a numpy array? — , Apr 29 '16 at 10:43
@hask.duk Generally speaking numpy arrays are much less flexible than lists, for example. But they tend to be much faster (and occupy less memory, although I'm not sure who I'm quoting), specially if you need to use it's broadcasting abilities. In your question you seemed to have indexation well controlled so why build bigger (though more flexible) algorithms when a simple solution could be used? I just gave the answer that seemed to me more adequate considering the criteria you had (although `genfromtxt` was an unfortunate choice of file reader, again sorry about that). — armatita, Apr 29 '16 at 11:30

M.T · Answer 2 · 2016-04-29T07:34:11.460

If the file is not too big:

import numpy as np
import os

def changethis(linelist,pos):
    appex = linelist[pos[2]-1][:pos[3]] + pos[1] + linelist[pos[2]-1][pos[3]+len(pos[0]):]
    linelist[pos[2]-1] = appex

pos = ('Multi','Three', 1, 3)

with open('in.cpp','r')  as f:
    lines=f.readlines()
    changethis(lines,pos)
print(''.join(lines))

readlines turns your file into a list of lines(which is memory-inefficient and slow, but does the job. If less than 1k lines it should be fine).

The function takes a list of lines as input, in addition to pos. I also modified the function to replce pos[0] with pos[1] instead of a * at line pos[2] and after character pos[3].

I get this as output:

/* Three-line 
comment
*/

#include <iostream>
#include <fstream>

using namespace std;

int main (int argc, char *argv[]) {
  int linecount = 0;
  double array[1000], sum=0, median=0, add=0;
  string filename;
  if (argc <= 1)
      {
          cout << "Error" << endl;
          return 0;
      }

Two things; first, I most probably going to have large files so perhaps a more efficient version -if you are aware of- would be ideal, second, I need the ability to replace a specific word or phrase given the word or phrase and its line & column numbers (i.e. exactly as it was stated in the [related question](http://stackoverflow.com/questions/36895533/replace-a-specific-word-given-its-position-in-a-text-file-python) — , Apr 28 '16 at 20:36
@hask.duk I would keep in mind the [answer](http://stackoverflow.com/a/36926105/5422525) given to your other question which handles replacing multiple words on the same line. — M.T, Apr 29 '16 at 07:41

user2357112 · Answer 3 · 2016-04-28T20:41:19.930

0

If you want a list of strings representing the lines of a file, open the file and use readlines():

with open('in.cpp') as f:
    lines = f.readlines()

# Have changethis take the list of lines as an argument
changethis(lines, pos)

Don't use np.genfromtxt; that's a tabular data parser with all sorts of behavior you don't want, such as treating # as a line comment marker.

Depending on what you intend to do with this list, you can probably even avoid needing an explicit list of lines. Also, file is a bad choice of variable name (it hides the built-in file), and changethis should really take the list as an argument instead of a global variable. In general, the earlier answer you got was pretty terrible.

edited Apr 28 '16 at 20:41

answered Apr 28 '16 at 20:32

user2357112

260,549
28
431
505

Thanks for the explanation. All good points. However, my main concern is how to achieve what it was stated in the [related question](http://stackoverflow.com/questions/36895533/replace-a-specific-word-given-its-position-in-a-text-file-python). Can you please give me some code alternative in order to do that with the `readlines` you suggest? – Apr 28 '16 at 20:38
I strongly agree on poor naming conventions and having list as input, but as `changethis` is defined (using a variable `file` not defined in the function), the smallest change to make the code work the intended way is the way I described. – M.T Apr 28 '16 at 20:39
@M.T I fixed the naming convention thing (it was written as a rough example). I do not see how the suggested code with the `readlines` will provide me with the ability to replace a specific given string with a specific position as described in the related question. – Apr 28 '16 at 20:45

Python numpy ndarray skipping lines from text

3 Answers3

Linked