-4
read = open('700kLine.txt')

# use readline() to read the first line 

line = read.readline()

aList = []

for line in read:
    try:
        num = int(line.strip())
        aList.append(num)
    except:
        print ("Not a number in line " + line)

read.close()
print(aList)

There is 700k Line in that file (every single line has max 2 digits number) I can only get ~280k Line in that file to in my aList.

So, How can I expand aList capacity 280k to 700k or more? (Is there a different solution for this case?)

Hello, I just solved that problem. Thanks for all your helps. That was an obvious buffer problem. Solution is just increasing the size of buffer.

link is here Increase output buffer when running or debugging in PyCharm

Ali Kahya
  • 1
  • 1
  • 7
  • 1
    [Is it safe to mix readline() and line iterators in python file processing?](//stackoverflow.com/q/4762262) – Aran-Fey Aug 30 '18 at 06:27
  • 4
    Lists don't *have* a capacity, they grow as needed. I don't see anything wrong with your code other than the mixing of `readline` and a loop. – Aran-Fey Aug 30 '18 at 06:29
  • I tought problem related with list capacity. Please, generate a 700k Line 2 digits numbers, after that read from txt and push into List. You will see that only ~280k will be collected into List. Please, Could you try that, you will understand what I mean. – Ali Kahya Aug 30 '18 at 06:34
  • @AliKahya if you need a list with numbers read about `list comprehensions` and `random.randint`. – Tom Wojcik Aug 30 '18 at 06:38
  • 1
    @AliKahya you can have even a list with 7million elements. try for example `[20000]*int(7e6)` and you will see that it is possible – Onyambu Aug 30 '18 at 06:48
  • @Onyambu what's wrong with my code block. The 700kLine.txt shows 3.268KB with ~700k numbers (with Line), however when I run my code and copy the output into a new text, it shows 1.024KB with -280k numbers into List. First ~420k numbers (lines) lose every time I try. I couldnt figure out. (Also, I check numbers with `print(len(aList))` That shows ~280k number in that aList) – Ali Kahya Aug 30 '18 at 06:57
  • can you try and count the number of lines you have? – Onyambu Aug 30 '18 at 07:07
  • I'm using pyCharm, do you think that output not fully shown at output frame? Is that possible? Could you please, advise a python Editor which you using. – Ali Kahya Aug 30 '18 at 07:08
  • @Onyambu It shows 707491, which is probably true. However, when I select all and copy-paste a new txt file, that shows me only ~280k number of lines. – Ali Kahya Aug 30 '18 at 07:10
  • cant really tell why. are you flexible to use other languages? eg perl? – Onyambu Aug 30 '18 at 07:11
  • @Onyambu I think, I can learn from perl documents, but I have never used before perl. What should I do? – Ali Kahya Aug 30 '18 at 07:14
  • just to get your point, are you trying to copy the file? – Onyambu Aug 30 '18 at 07:15
  • @Onyambu I mean, I manually copy that output from Output frame, and then copy into another txt file for the counting elements of aList. – Ali Kahya Aug 30 '18 at 07:20
  • This is quite difficult since we are dealing with a virtual problem. We do not know the problem, neither can we reproduce the problem. – Onyambu Aug 30 '18 at 07:23
  • @Onyambu if you have time, could you simulate the case: 1-Create 700k numbers line by line. 2-create a 700kLine.txt file manually and paste those numbers into file. 3-Run my code. 4-copy manually output which produced and paste a new txt file. 5-Lastely, count those numbers. If you have same numbers that you generate, please share with me results. – Ali Kahya Aug 30 '18 at 07:31
  • I now get your problem You should never copy paste manually. Always the numbers printed on the screen are less than the ones in a list. you will always see an allipsis to indicate this. you need to `write` the numbers and not copy the numbers manually. You cannot do that. – Onyambu Aug 30 '18 at 07:48
  • @AliKahya I wrote this script that proves your problem - whatever it is - as __nothing__ to do with "list capacity" : https://gist.github.com/BrunoDesthuilliers/3acbcb259981436aaf904adf562d9728 – bruno desthuilliers Aug 30 '18 at 08:40

4 Answers4

0

Please try this.

filename = '700kLine.txt'

with open(filename) as f:
    data = f.readlines()

print(data)
print(type(data)) #stores the data in a list
V Sree Harissh
  • 665
  • 5
  • 24
0

Yes, you can.

Once a list is defined, you can add, edit or delete its elements. To add more elements at the end, use the append function:

MyList.append(data)

Where MyList is the name of the list and data is the element you want to add.

Hexaholic
  • 3,299
  • 7
  • 30
  • 39
imharjyotbagga
  • 199
  • 4
  • 17
0

It could be that your computer ran out of memory processing the file? I have tried generating an infinite loop appending a single digit to the list and I ended up with 47 million-ish len(list) >> 47119572, the code I use to test as below.

I tried this code on an online REPL and it came to a significantly lower 'len(list)`.

list = []

while True:
  try:
    if len(list) > 0:
      list.append(list[-1] + 1)
    else:
      list.append(1)
  except MemoryError:
      print("memory error, last count is: ", list[-1])
      raise MemoryError

Maybe try saving bits of data read instead of reading the whole file at once?

Just my assumption.

Wong Siwei
  • 115
  • 2
  • 6
  • You are touching a good point. The problem could be an ordinary MemoryError. – Ali Kahya Aug 30 '18 at 07:34
  • if it was a memory issue the Python runtime would have raised a `MemoryError`. Unless PyCharm is messing with runtime errors, that is... – bruno desthuilliers Aug 30 '18 at 08:16
  • I guess a point worth noting here, as I tried the same code on a online REPL, memory error was not raised. Instead, it just stopped incrementing, I was safe to assume it was memory error after trying the exact same code multiple time on my local REPL. So, I could probably be an issue with the editor or IDE? – Wong Siwei Aug 30 '18 at 08:33
0

I tried to re-create your problem:

# creating 700kLine file
with open('700kLine.txt', 'w') as f:
    for i in range(700000):
        f.write(str(i+1) + '\n')

# creating list from file entries
aList = []
with open('700kLine.txt', 'r') as f:
    for line in f:
        num = int(line.strip())
        aList.append(num)

# print(aList)
print(aList[:30])

Jupyter notebook throws an error while printing all 700K lines due to too much memory used. If you really want to print all 700k values, run the python script from terminal.

Paresh
  • 654
  • 5
  • 7
  • pyCharm is usefull, could you try one more time with pyCharm IDE. – Ali Kahya Aug 30 '18 at 08:11
  • Never used PyCharm. Its probably the same problem - printing all 700K values takes up too much memory, so it shows you the first few values and hides the rest. – Paresh Aug 30 '18 at 08:20