2

I am combining two dataframe values from an Excel file to a new dataframe but the combined values changed to decimal number. Here are my codes:

My dataframe that I wish to combine:

cable_block    pair
1              10
1              11
3              123
3              222

I insert a dataframe to have those two combined with a delimiter of /, so here is my code:

df['new_col'] = df[['cable_block', 'pair']].apply(lambda x: '/'.join(x.astype(str), axis=1))

The result I get is:

cable_block    pair   new_col
1              10     1.0/10.0
1              11     1.0/11.0
3              123    3.0/123.0
3              222    3.0/222.0

After searching, I found good answer by here Psidom and Skirrebattie. So I tried:

df['new_col'] = df['new_col'].applymap(str)

and

df['new_col'] = df['new_col'].astype(str)

But it doesn't work the way it should. Looking by the codes, it should work and I find it weird that it doesn't.

Is there another work around?

jpp
  • 159,742
  • 34
  • 281
  • 339
Ricky Aguilar
  • 329
  • 1
  • 8
  • 19
  • Is there particular reason you need a string? Even a `tuple` may be more useful / efficient for manipulation going forwards, but a string is expensive to create and difficult to manipulate. – jpp Oct 01 '18 at 16:13
  • @jpp I am trying to use string because my desired output would look like I am dividing the two dataframes. I though using string might retain the integrity of the data. – Ricky Aguilar Oct 01 '18 at 16:15
  • Terminology note, you are concatenating strings in **series**, not "two dataframes". A dataframe is the entire table, a series is a column. – jpp Oct 01 '18 at 16:36
  • @jpp Noted on that, thank you for the note. – Ricky Aguilar Oct 01 '18 at 16:40

4 Answers4

3

First, to remove the trailing .0 ensure that data is int:

 df = df.astype(int)

Then you can do:

df['cable_block'].astype(str) + '/' + df['pair'].astype(str)

0     1/10
1     1/11
2    3/123
3    3/222
dtype: object

Another option to ensure a correct formatting could be:

 df.apply(lambda x: "%d/%d" %(x['cable_block'], x['pair']), axis=1)

0     1/10
1     1/11
2    3/123
3    3/222
dtype: object
Mabel Villalba
  • 2,538
  • 8
  • 19
  • I still got the same output that has `.0`. The code is correct but I am not getting the correct output. – Ricky Aguilar Oct 01 '18 at 16:18
  • it's strange. Which pandas version are you using? Mine is 0.22.0. Maybe it's a bug. Why don't you try converting to int first: `df['cable_block'].astype(int).astype(str) + '/' + df['pair'].astype(int).astype(str)` ? – Mabel Villalba Oct 01 '18 at 16:26
  • I am using 0.23.3. I tried converting it to int first then I got `ValueError: cannot convert float NaN to integer` – Ricky Aguilar Oct 01 '18 at 16:30
  • 1
    Is it the same dataframe? In the one you shared there is no NaN. If any, you can drop them first: `df = df.dropna(how='any', axis=1)` – Mabel Villalba Oct 01 '18 at 16:34
  • 1
    This does it! So the NaN should be dropped first. Thank you for your help. – Ricky Aguilar Oct 01 '18 at 16:39
  • 1
    Glad it helped. You should put reproducible examples, and that was the problem, the dataframe did not have the NaNs yours did. – Mabel Villalba Oct 01 '18 at 16:43
  • 1
    Understood. I was looking on the side that the NaN would also concatenate after the dataframes are converted to stiring and I could just .drop.contains() along the way. I will take note of this. Thank you. – Ricky Aguilar Oct 01 '18 at 16:49
2

Why not using astype

df.astype(str).apply('/'.join,1)
Out[604]: 
0     1/10
1     1/11
2    3/123
3    3/222
dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
0
df['cable_block'].astype(int).astype(str) + '/' + df['pair'].astype(int).astype(str)

The data in your dataframe is probably floats, not ints.

CJR
  • 3,916
  • 2
  • 10
  • 23
  • Tried that but got `ValueError: cannot convert float NaN to integer` – Ricky Aguilar Oct 01 '18 at 16:33
  • If you're willing to live dangerously: `df['cable_block'].astype(int, errors='ignore').astype(str, , errors='ignore') + '/' + df['pair'].astype(int, , errors='ignore').astype(str, , errors='ignore')` – CJR Oct 01 '18 at 18:24
0

You can use a list comprehension and f-strings:

df['new_col'] = [f'{cable_block}/{pair}' for cable_block, pair in df.values]

print(df)

   cable_block  pair new_col
0            1    10    1/10
1            1    11    1/11
2            3   123   3/123
3            3   222   3/222

The approach compares reasonably well versus the alternatives:

df = pd.concat([df]*10000, ignore_index=True)

%timeit df['cable_block'].astype(str) + '/' + df['pair'].astype(str)  # 62.8 ms
%timeit [f'{cable_block}/{pair}' for cable_block, pair in df.values]  # 85.1 ms
%timeit list(map('/'.join, map(list, df.values.astype(str))))         # 157 ms
%timeit df.astype(str).apply('/'.join,1)                              # 1.11 s
jpp
  • 159,742
  • 34
  • 281
  • 339