Combing dataframe values to a new dataframe

Question

I am combining two dataframe values from an Excel file to a new dataframe but the combined values changed to decimal number. Here are my codes:

My dataframe that I wish to combine:

cable_block    pair
1              10
1              11
3              123
3              222

I insert a dataframe to have those two combined with a delimiter of /, so here is my code:

df['new_col'] = df[['cable_block', 'pair']].apply(lambda x: '/'.join(x.astype(str), axis=1))

The result I get is:

cable_block    pair   new_col
1              10     1.0/10.0
1              11     1.0/11.0
3              123    3.0/123.0
3              222    3.0/222.0

After searching, I found good answer by here Psidom and Skirrebattie. So I tried:

df['new_col'] = df['new_col'].applymap(str)

and

df['new_col'] = df['new_col'].astype(str)

But it doesn't work the way it should. Looking by the codes, it should work and I find it weird that it doesn't.

Is there another work around?

Is there particular reason you need a string? Even a `tuple` may be more useful / efficient for manipulation going forwards, but a string is expensive to create and difficult to manipulate. — jpp, Oct 01 '18 at 16:13
@jpp I am trying to use string because my desired output would look like I am dividing the two dataframes. I though using string might retain the integrity of the data. — Ricky Aguilar, Oct 01 '18 at 16:15
Terminology note, you are concatenating strings in **series**, not "two dataframes". A dataframe is the entire table, a series is a column. — jpp, Oct 01 '18 at 16:36

Mabel Villalba · Accepted Answer · 2018-10-01T16:30:36.463

3

First, to remove the trailing .0 ensure that data is int:

 df = df.astype(int)

Then you can do:

df['cable_block'].astype(str) + '/' + df['pair'].astype(str)

0     1/10
1     1/11
2    3/123
3    3/222
dtype: object

Another option to ensure a correct formatting could be:

 df.apply(lambda x: "%d/%d" %(x['cable_block'], x['pair']), axis=1)

0     1/10
1     1/11
2    3/123
3    3/222
dtype: object

edited Oct 01 '18 at 16:30

answered Oct 01 '18 at 16:10

Mabel Villalba

2,538
8
19

I still got the same output that has `.0`. The code is correct but I am not getting the correct output. – Ricky Aguilar Oct 01 '18 at 16:18
it's strange. Which pandas version are you using? Mine is 0.22.0. Maybe it's a bug. Why don't you try converting to int first: `df['cable_block'].astype(int).astype(str) + '/' + df['pair'].astype(int).astype(str)` ? – Mabel Villalba Oct 01 '18 at 16:26
I am using 0.23.3. I tried converting it to int first then I got `ValueError: cannot convert float NaN to integer` – Ricky Aguilar Oct 01 '18 at 16:30
1

Is it the same dataframe? In the one you shared there is no NaN. If any, you can drop them first: `df = df.dropna(how='any', axis=1)` – Mabel Villalba Oct 01 '18 at 16:34
1

This does it! So the NaN should be dropped first. Thank you for your help. – Ricky Aguilar Oct 01 '18 at 16:39
1

Glad it helped. You should put reproducible examples, and that was the problem, the dataframe did not have the NaNs yours did. – Mabel Villalba Oct 01 '18 at 16:43
1

Understood. I was looking on the side that the NaN would also concatenate after the dataframes are converted to stiring and I could just .drop.contains() along the way. I will take note of this. Thank you. – Ricky Aguilar Oct 01 '18 at 16:49

score 2 · Answer 2 · answered Oct 01 '18 at 16:10

2

Why not using astype

df.astype(str).apply('/'.join,1)
Out[604]: 
0     1/10
1     1/11
2    3/123
3    3/222
dtype: object

answered Oct 01 '18 at 16:10

BENY

317,841
20
164
234

I have more than two columns in my Excel file, so I think I need to specify the dataframes to be combined. – Ricky Aguilar Oct 01 '18 at 16:19
@RickyAguilar then df[['cable_block', 'pair']].astype(str).apply('/'.join,1) – BENY Oct 01 '18 at 16:27
Thanks but I still got the same output. I think the 'pair' dataframe is a float but when I try to convert it to int, I got ValueError. – Ricky Aguilar Oct 01 '18 at 16:32

score 0 · Answer 3 · answered Oct 01 '18 at 16:27

0

df['cable_block'].astype(int).astype(str) + '/' + df['pair'].astype(int).astype(str)

The data in your dataframe is probably floats, not ints.

answered Oct 01 '18 at 16:27

CJR

3,916
2
10
23

Tried that but got `ValueError: cannot convert float NaN to integer` – Ricky Aguilar Oct 01 '18 at 16:33
If you're willing to live dangerously: `df['cable_block'].astype(int, errors='ignore').astype(str, , errors='ignore') + '/' + df['pair'].astype(int, , errors='ignore').astype(str, , errors='ignore')` – CJR Oct 01 '18 at 18:24

jpp · Answer 4 · 2018-10-01T16:46:15.673

You can use a list comprehension and f-strings:

df['new_col'] = [f'{cable_block}/{pair}' for cable_block, pair in df.values]

print(df)

   cable_block  pair new_col
0            1    10    1/10
1            1    11    1/11
2            3   123   3/123
3            3   222   3/222

The approach compares reasonably well versus the alternatives:

df = pd.concat([df]*10000, ignore_index=True)

%timeit df['cable_block'].astype(str) + '/' + df['pair'].astype(str)  # 62.8 ms
%timeit [f'{cable_block}/{pair}' for cable_block, pair in df.values]  # 85.1 ms
%timeit list(map('/'.join, map(list, df.values.astype(str))))         # 157 ms
%timeit df.astype(str).apply('/'.join,1)                              # 1.11 s

Combing dataframe values to a new dataframe

4 Answers4