2

I have a sample data (Data_sample_truncated.txt) which I truncated from a big data. It has 3 fields - "Index", "Time" and "RxIn.Density[**x**, ::]" Here I used x as integer as x can vary for any range. In this data it is 0-15. The combination of the 3 column fields is unique. For different "Index" field the "Time" and "RxIn.Density[**x**, ::]" can be same or different. For each new "Index" value the data has a blank line and almost similar column headers except for "RxIn.Density[**x**, ::]" where x is increasing when new "Index" value is reached. The data which I export from ADS (circuit simulation software) gives me like this format while exporting.

Now I want to format the data so that all the data are merged together under 3 unique column fields - "Index", "Time" and "RxIn.Density". You can see I want to remove the strings [**x**, ::] in the new dataframe of the 3rd column. Here is the sample final data file that I want after formatting (Data-format_I_want_after_formatting.txt). So I want the following -

  • The blank lines (or rows) to be removed
  • All the other header lines to be removed keeping the top header only and changing the 3rd column header to "RxIn.Density"
  • Keeping all the data merged under the unique column fields - "Index", "Time" and "RxIn.Density", even if the data values are duplicate.

My MATLAB code is in the below:

import pandas as pd

#create DataFrame from csv with columns f and v 
df = pd.read_csv('Data_sample_truncated.txt', sep="\s+", names=['index','time','v'])

#boolean mask for identify columns of new df   
m = df['v'].str.contains('RxIn')

#new column by replace NaNs by forward filling
df['g'] = df['v'].where(m).ffill()

#get original ordering for new columns
#cols = df['g'].unique()

#remove rows with same values in v and g columns
#df = df[df['v'] != df['g']]

df = df.drop_duplicates(subset=['index', 'time'], keep=False)

df.to_csv('target.txt', index=False, sep='\t')

The generated target.txt file is not what I wanted. You can check it here. Can anyone help what is wrong with my code and what to do to fix it so that I wan my intended formatting?

I am using Spyder 3.2.6 (Anaconda) where python 3.6.4 64-bit is embedded.

aguntuk
  • 127
  • 9

2 Answers2

1

Try this:

df = pd.read_csv('Data_sample_truncated.txt', sep='\s+', names=['index', 'time', 'RxIn.Density', 'mask'], header=None)

df = df[df['mask'].isna()].drop(['mask'], axis=1)

df.to_csv('target.txt', index=False, sep='\t')
Chris Adams
  • 18,389
  • 4
  • 22
  • 39
1

You can just filter out rows, that you do not want(check this):

import pandas as pd
df = pd.read_csv('Data_sample_truncated.txt', sep="\s+")
df.columns = ["index","time","RxIn.Density","1"]
del df["1"]
df = df[~df["RxIn.Density"].str.contains("Rx")].reset_index(drop=True)
df.to_csv('target.txt', index=False, sep='\t')
Dmitriy Kisil
  • 2,858
  • 2
  • 16
  • 35