0

I need to reformat some data. I've never used pandas before, and could use some help.

I have two DataFrames: df1 dfTarget

df1 is the un-formatted data, dfTarget is how I need the data to be formatted

Based on a condition, I need one group of columns in df1 to be copied to certain columns in dfTarget. If the condition is false, I need another group of columns in df1 to be copied to certain columns in dfTarget.

Simplified df1:

    city   state   condition   city2   state2
0
1
2
3
4

Simplified dfTarget:

    mCity   mState
0
1
2
3
4

Basically, if the condition is true, I need to move 'city' and 'state' into 'mCity' and 'mState' respectively. If the condition is false, I need to move 'city2' and 'state2' into 'mCity' and 'mState'.

dfTarget is starting off empty, and needs to be filled row by row based on a bunch of conditions in df1.

I've never used pandas, and tried to research this myself, but got lost quickly in all the different methods. Please, what's the best way to do this?

cs95
  • 379,657
  • 97
  • 704
  • 746
camstar915
  • 133
  • 2
  • 7
  • whats the logic/condition? sounds like a case for `melt`, `groupby` and `pivot` – Umar.H Feb 18 '20 at 20:04
  • condition field is either a 1 or a 0. I could change it to True or False pretty easily if that simplifies things – camstar915 Feb 18 '20 at 20:07
  • _condition field is either a 1 or a 0. I could change it to True or False pretty easily if that simplifies things_ That's probably a good idea. – AMC Feb 19 '20 at 02:48
  • This seems like rather basic Pandas, by the way. Have you read the docs? – AMC Feb 19 '20 at 02:48
  • Does this answer your question? [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – AMC Feb 19 '20 at 02:49

1 Answers1

3

It should be simple enough to conditionally assign the columns, assuming the indices and/or number of rows is the same.

If the condition comes from a column, you can try np.where:

dfTarget[['mCity', 'mState']] = np.where(
    df1[['condition']], df1[['city', 'state']], df1[['city2', 'state2']])

Minimal Example

df1 = pd.DataFrame({
    'city': list('abc'), 
    'state': list('def'), 
    'condition': [True, False, True], 
    'city2': list('hij'), 
    'state2': list('klm')})
dfTarget = pd.DataFrame(index=df1.index, columns=['mCity', 'mState'])

dfTarget[['mCity', 'mState']] = np.where(
    df1[['condition']], df1[['city', 'state']], df1[['city2', 'state2']])

  mCity mState
0     a      d
1     i      l    # comes from second group of columns
2     c      f
cs95
  • 379,657
  • 97
  • 704
  • 746