Add meaning to values in an object

Question

Is it possible to utilize the third column in the following example, to kind of "spread out"/unravel the values in e.g. a Pandas DataFrame in Python without actually duplicating the rows? So If we have an object looking like this:

X   Y   Count
1   2   3
2   2   2
4   3   1

How would I be able to give Count meaning here without unraveling the rows into Count * row because that does not seem like a good solution as it makes the data take up much more space in memory.

So I don't want the DataFrame to just look like this:

X   Y   Count
1   2   1
1   2   1
1   2   1
2   2   1
2   2   1
4   3   1

I don't understand your question. You say you want to "spread" the values (without saying what that means), then you say you want to "give Count meaning" (without saying what that means), then you say you want to do KNN clustering. What is it you actually want to do? — BrenBarn, Apr 20 '16 at 18:48
@BrenBarn I find it hard to formulate it. I want the `count` column to have some meaning in doing a KNN clustering or whatever. If the values we're just one entry per row it would be easier, but they are added together based on the *X* and *Y*. Does it make sense? — eikooc, Apr 20 '16 at 18:49
Are you looking for something like this? http://stackoverflow.com/questions/26777832/replicating-rows-in-a-pandas-data-frame-by-a-column-value — ayhan, Apr 20 '16 at 18:56
@ayhan kind of but I would like to avoid duplicating the data if possible as _greole_ is pointing out in the comments — eikooc, Apr 20 '16 at 18:58

ptrj · Answer 1 · 2016-04-20T19:39:17.420

0

I think you mean something like this:

new_df = df.loc[df.index.repeat(df['Count'])]

Then row df.loc[n] is repeated df.Count[n] number of times. It's sort of a reverse to groupby.

Update

I tried new_df['Count'] = 1 and it raised a SettingWithCopyWarning unless I made an explicit copy:

new_df = df.loc[df.index.repeat(df['Count'])].copy()
new_df['Count'] = 1    # <- now it works without a warning

edited Apr 20 '16 at 19:39

answered Apr 20 '16 at 19:03

ptrj

5,152
18
31

Can it be done without duplicating the data like this as it is a very large dataframe? – eikooc Apr 20 '16 at 19:07
You can use `df.loc[...]` without creating a new data frame. But I'm not sure if it duplicates rows or not. – ptrj Apr 20 '16 at 19:15
yes, just do df = df.loc[df.index.repeat(df['Count'])] and then df['Count'] = 1 – DevLounge Apr 20 '16 at 19:15

Add meaning to values in an object

1 Answers1