I have a string that is 5 GB in size, I would like to get the last 30 characters of the string. Is using the slice function the best way to get that substring, will it cause memory problem? Is it that another 5 GB will be created because a 4.99 GB and a 0.1 kb substring are created during the splitting process?
-
What "spliting" process. What *exactly* are you doing? – juanpa.arrivillaga Mar 03 '21 at 02:52
4 Answers
I believe you could use negative indexing.
sample_string = 'hello there'
print(sample_string[-3:])

- 196
- 9
You can get the last 30 characters using string slicing e.g. name_of_string[-30:] to slice the last 30 characters. This won't create a new object for the rest of the string.

- 396
- 1
- 8
str.split()
creates a list. So, you will end up with, at the very least, a 5GB string and a 5GB list, plus whatever memory is used in the process. The best way to get the last x
characters of a string is negative indexing.
x = 30
last_30_characters = very_long_string[-x:]
Edit: Slicing a list does not generate a copy, so, at maximum, it should only use as much memory as is needed for the original string. Source.

- 325
- 4
- 9
I assume you have your string stored in a file.
You don't have to load your entire string into memory even if there is no \n
separating them. This link is helpful: https://docs.python.org/3/tutorial/inputoutput.html
Say, text.txt
file contains 0123456789\n
as its content.
with open('text.txt', 'rb') as f:
f.seek(-4, 2) # move the file cursor to the 4th last byte.
# read the rest string into memory, strip trailing newline, decode to normal string
text = f.read().strip().decode("utf-8")
print(text) # '789'
You need to adjust it to your application.

- 721
- 1
- 7
- 14
-
tried what you said, and thanks for your answer first, I got some error, will come back if I cannot solve it. – desmond ng Mar 04 '21 at 06:51