0

I have a list of IDs I'm trying to manipulate in order to get long strings of the following format:

"ID1,ID2,ID3,....,IDn".

that is, a long string where the IDs are separated by commas, which are also a part of the string.

Here's the catch: each of these long strings can be no more than 1,000 characters in length total, and since I want the code to work on any list I feed it, I can't say in advance how many strings of length 1,000 the raw list of IDs will require.

So, ideally I'd like to have a script that takes that list of IDs and generates variables of the following form:

str1 = string of the first 1000 chars
str2 = string of the next 1000 chars
srt3 = string of the next 1000 chars

and so on.

How do I do that? How do I generate variables on the go by need?

I thought about maybe generating one long string at the first stage, like:

long_str = ""
for item in my_list:
    long_str += str(item) + ","

def find_num_segments(x):
    if x % 1000 == 0:
        return x/1000
    else:
        return x/1000  + 1      # notice that if x < 1000, then x%1000 != 0 (actually ==   x). So it will fall under else. x<1000 / 1000 gives 0, so the function yields 0+1

num_segments = find_num_segments(long_str)

for i in range(num_segments):
    starting_position = 0
    print "str%d : " %i , num_segments[starting_position,starting_position+1000]
    starting_position += 1000

BUT, that's:

  1. Ugly and probably unpythonic.
  2. Doesn't produce clean results as there are "formatting leftovers" such as the brackets and quotes - ['014300070358Ful'], ['014300031032Uni'].
  3. Doesn't actually work :) I get a "TypeError: not all arguments converted during string formatting" on the 1st line of find_num_segments().

EDIT: I realize that won't produce the correct outcome either, as there's no guarantee an id won't be "cut" in the middle.

How would you create a function that concatenates the IDs one by one and "stops" before getting to the 1000 chars mark if the next ID won't fit all the way in, then starts a new batch of 1000 chars?


Here's a sample list if anybody wants to help and needs one.

Help would be appreciated! Thanks :)

Optimesh
  • 2,667
  • 6
  • 22
  • 22
  • 3
    Why don't you store them in a list? Say `the_str = ['first1000chars', 'second1000chars', ..., 'last1000chars']`. And the mistake in your code is that you should use `len(x) % 1000 == 0` instead of `x % 1000 == 0`, because `%` is a string formatting symbol, as you have used in `print "str%d : "% i` – justhalf Apr 07 '14 at 08:55
  • Yeah, using a list would be way better (in every possible way) than dynamically generating variables. – anon582847382 Apr 07 '14 at 08:56
  • 1
    I'm mildly confused as to what you're *actually* trying to accomplish. You've definitely given us what you're trying to do to accomplish something, but I'm pretty sure you've gone and focused on the *how* rather than the *why*. Do you just need several ids concatenated into 1000 char or less CSV rows? Or what? – Wayne Werner Apr 07 '14 at 09:33
  • @WayneWerner Sorry if I wasn't clear. Just to concatenate IDs to long strings, each string ~1000 chars long (but no more than 1000). For example, if my raw data only included: id1,id2,id3, then what I want is: str1 = "id1,id2,id3" - one long continuous string. – Optimesh Apr 07 '14 at 11:03

2 Answers2

1

You should use lists instead. It is much easier. You can do something like:

string_list = []

Then whenever you need to add a variable you can simply do string_list.append(value)

For example, if you have a list of id's, you can do:

id = []
id.append(ID)

You can also easily fetch an id. Say for example you want to get the fifth id, you can do:

id[4]

Examples

>>> id = []
>>> id.append(123)
>>> id.append(125)
>>> id.append(127)

>>> print id
[123,125,127]

>>> id[0]
123
>>> id[-1]   #last element
127

You can also search for elements in it:

>>> a.index(125)
1

[NOTE]

If you want to split a string after every 1000 characters say, just do:

id_list = [your_string[i:i+1000] for i in range(0,len(your_string),1000)]
sshashank124
  • 31,495
  • 9
  • 67
  • 76
  • I'm not sure whether we should guess that OP wants each ID separately in each element of the list. Since OP asked for concatenating the string into a string, I guess we should at least address that before putting each ID as an element in a list. Probably OP really has real use for the concatenated string, since it's limited to 1000 chars, and that OP says "raw IDs". – justhalf Apr 07 '14 at 09:01
  • @justhalf, In that case, can't the OP just do: `id[0] = his_string[:1000]` – sshashank124 Apr 07 '14 at 09:05
  • Yes, that is what you should include in your answer. And probably OP might then further clarify whether OP wants no ID to be split between strings. – justhalf Apr 07 '14 at 09:08
  • Hi, as I wrote in my post, and I'm sorry if I wasn't clear enough, I want long strings, str1 = string of the first 1000 chars - for example, would be ONE long string made of a concatenation of as many IDs you can fit in a 1000 chars including the spaces between them. – Optimesh Apr 07 '14 at 09:16
  • @Optimesh, Please see last few lines of my answer. – sshashank124 Apr 07 '14 at 09:17
  • @Optimesh, last line updated. tested it and it works. – sshashank124 Apr 07 '14 at 09:18
  • @sshashank124 thank you. Can you please have a look at the 'edit' I added to the OP? – Optimesh Apr 07 '14 at 09:25
0

If you really want to dynamically generate variables you can do this:

short_strings = [long_string[i:i+1000] for i in range(0, len(long_string), 1000)]
for i, short_string in enumerate(short_strings):
    globals()['str{}'.format(i)] = short_string

EDIT

To prevent ids being chopped in half:

i = 0
short_string = ''
for uid in uids:
    if short_string:
        if len(short_string) + len(uid) < 1000:
            short_string += ','+uid
        else:
            globals()['str{}'.format(i)] = short_string
            i += 1
            short_string = uid
    elif len(uid) <= 1000:
        short_string = uid
    else:
        print('Unique ID is too long:', uid)
Scorpion_God
  • 1,499
  • 10
  • 15
  • @Optimesh Can each short string be exactly 991 characters instead? Are the ids always 15 characters long? – Scorpion_God Apr 07 '14 at 11:48
  • They won't always be 15 chars long. Each string can be 991 chars long but then the code is not necessarily the most efficient. Thanks – Optimesh Apr 07 '14 at 14:27
  • @Optimesh like sshashank124 said, you should really put your short strings into a list. – Scorpion_God Apr 07 '14 at 14:43
  • ok, makes sense, but what should i do then? I need some algorithm to make sure concatenation stops when the next item won't fit fully as the 1000 chars cutoff would be reached. – Optimesh Apr 08 '14 at 07:00
  • @Optimesh It does stop. Once the short_string is too long, it creates a variable with it, and then resets the short_string. – Scorpion_God Apr 08 '14 at 07:05