0

Can you please help me to work with column having delimited integer values in python ?

How can we create an additional column say "PHR_INSTANTENEOUS_MIN" which stores the minimum value of the numbers in PHR_INSTANTENEOUS. Like in first row : "-18" and in third row "14"

Similarly : PHR_INSTANTENEOUS_MIN , PHR_INSTANTENEOUS_MEDIAN, PHR_INSTANTENEOUS_MODE derived values.

Similar thing to be repeated for SINR_INSTANTENEOUS values, and we need to form derived values.

df1
START_TIME PRIMARY_KEY PHR_INSTANTANEOUS SINR_INSTANTANEOUS
2020-03-10 12:00:00 e7ca9da318f1 -18|-17 9|8
2020-03-10 12:01:00 68615e3db513 1 26
2020-03-10 12:05:00 7f250354808a 14|18|20|20 26|26|24|26
2020-03-10 12:07:00 9202ab7611d4 -8|-7|40 22|6|-2
2020-03-10 12:12:00 377bf955bdc0 4|9 26|20

Full Data set image is below :

enter image description here

pratyada
  • 49
  • 4

1 Answers1

1

Here's a way to do that:

import pandas as pd
from statistics import median, mode
import numpy as np

df = pd.DataFrame(['-18|-17', '1', '14|18|20|20', '-8|-7|40', 5.2, np.nan], columns=['PHR_INSTANTANEOUS'])

# make sure the dtype is uniformly string
df['PHR_INSTANTANEOUS'] = df['PHR_INSTANTANEOUS'].astype(str)

# get the values
df['PHR_INSTANTANEOUS'].apply(lambda x: min(map(float, x.split('|'))))  # minimum
df['PHR_INSTANTANEOUS'].apply(lambda x: median(map(float, x.split('|'))))  # median
df['PHR_INSTANTANEOUS'].apply(lambda x: mode(map(float, x.split('|'))))  # mode
RaidasGrisk
  • 444
  • 2
  • 12
  • 1
    I think changing the list comprehension to `map` would be nicer - `df['PHR_INSTANTANEOUS'].apply(lambda x: min(map(int, x.split('|'))))` – Daniel Geffen Jun 11 '20 at 18:18
  • df['PHR_INSTANTANEOUS'].apply(lambda x: min(map(int, x.split('|')))) - this is giving me ERROR : AttributeError: 'float' object has no attribute 'split' – pratyada Jun 11 '20 at 18:22
  • Bedore running the code, try making sure the dtype of the column is uniform like this: df['PHR_INSTANTANEOUS'] = df['PHR_INSTANTANEOUS'].astype(str) – RaidasGrisk Jun 11 '20 at 18:25
  • THANKYOU both for your quick replies.. But after making .astype(str) its again giving ValueError: invalid literal for int() with base 10: 'nan' .. DO i need to remove all nan rows first ? – pratyada Jun 11 '20 at 18:27
  • Don't need to remove nan values. Check the edited answer, it should work just fine now. – RaidasGrisk Jun 11 '20 at 18:29